date:20191127

Re: [PATCH 8/9] monitor: move hmp_info_block* to blockdev-hmp-cmds.c

2019-11-27 Thread Markus Armbruster

I think it makes sense to collect *all* block HMP stuff here.

Left in monitor/hmp-cmds.c: hmp_eject(), hmp_nbd_server_start(), ...

I guess hmp_change() has to stay there, because it's both block and ui.

Left in blockdev.c: hmp_drive_add_node().

Quick grep for possible files to check:

$ git-grep -l 'monitor[a-z_-]*.h' | xargs grep -l 'block[a-z_-]*\.h'
MAINTAINERS
blockdev-hmp-cmds.c
blockdev.c
cpus.c
dump/dump.c
hw/display/qxl.c
hw/scsi/vhost-scsi.c
hw/usb/dev-storage.c
include/monitor/monitor.h
migration/migration.c
monitor/hmp-cmds.c
monitor/hmp.c
monitor/misc.c
monitor/qmp-cmds.c
qdev-monitor.c
vl.c

Re: [RESEND PATCH v21 2/6] docs: APEI GHES generation and CPER record description

2019-11-27 Thread Igor Mammedov

On Wed, 27 Nov 2019 09:37:57 +0800
Xiang Zheng  wrote:

> Hi Igor,
> 
> Thanks for your review!
> Since the series of patches are going to be merged, we will address your 
> comments by follow up patches.

Yes, I know it's quite frustrating to respin series multiple times,
but on the other hand it's more frustrating later on when reader
tries to figure out mess caused by a bunch of fixups in commit
history.

With amount of issues spotted during review, which also requires
rewriting some patches. I don't see big vXX as a valid reason
to merge without other compelling reason, especially at
the beginning of merge window.
(it might be fine right before soft-freeze if issues are minor
but is not the case here)

If I were you, I'd just respin v22 with comments addressed.
(from my side I can promise to review it shortly after that,
while I still remember how it works)

[...]

[PATCH v18 1/8] numa: Extend CLI to provide initiator information for numa nodes

2019-11-27 Thread Tao Xu

In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
The initiator represents processor which access to memory. And in 5.2.27.3
Memory Proximity Domain Attributes Structure, the attached initiator is
defined as where the memory controller responsible for a memory proximity
domain. With attached initiator information, the topology of heterogeneous
memory can be described.

Extend CLI of "-numa node" option to indicate the initiator numa node-id.
In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Igor Mammedov 
Reviewed-by: Jingqi Liu 
Suggested-by: Dan Williams 
Signed-off-by: Tao Xu 
---

No changes in v18.

Changes in v15:
- Change the QAPI version tag to 5.0 (Eric)
---
 hw/core/machine.c | 64 +++
 hw/core/numa.c| 23 
 include/sysemu/numa.h |  5 
 qapi/machine.json | 10 ++-
 qemu-options.hx   | 35 +++
 5 files changed, 131 insertions(+), 6 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 1689ad3bf8..d7d2cfa66d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -518,6 +518,20 @@ static void machine_set_nvdimm(Object *obj, bool value, 
Error **errp)
 ms->nvdimms_state->is_enabled = value;
 }
 
+static bool machine_get_hmat(Object *obj, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+return ms->numa_state->hmat_enabled;
+}
+
+static void machine_set_hmat(Object *obj, bool value, Error **errp)
+{
+MachineState *ms = MACHINE(obj);
+
+ms->numa_state->hmat_enabled = value;
+}
+
 static char *machine_get_nvdimm_persistence(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -645,6 +659,7 @@ void machine_set_cpu_numa_node(MachineState *machine,
const CpuInstanceProperties *props, Error 
**errp)
 {
 MachineClass *mc = MACHINE_GET_CLASS(machine);
+NodeInfo *numa_info = machine->numa_state->nodes;
 bool match = false;
 int i;
 
@@ -714,6 +729,17 @@ void machine_set_cpu_numa_node(MachineState *machine,
 match = true;
 slot->props.node_id = props->node_id;
 slot->props.has_node_id = props->has_node_id;
+
+if (machine->numa_state->hmat_enabled) {
+if ((numa_info[props->node_id].initiator < MAX_NODES) &&
+(props->node_id != numa_info[props->node_id].initiator)) {
+error_setg(errp, "The initiator of CPU NUMA node %" PRId64
+" should be itself", props->node_id);
+return;
+}
+numa_info[props->node_id].has_cpu = true;
+numa_info[props->node_id].initiator = props->node_id;
+}
 }
 
 if (!match) {
@@ -960,6 +986,13 @@ static void machine_initfn(Object *obj)
 
 if (mc->numa_mem_supported) {
 ms->numa_state = g_new0(NumaState, 1);
+object_property_add_bool(obj, "hmat",
+ machine_get_hmat, machine_set_hmat,
+ &error_abort);
+object_property_set_description(obj, "hmat",
+"Set on/off to enable/disable "
+"ACPI Heterogeneous Memory Attribute "
+"Table (HMAT)", NULL);
 }
 
 /* Register notifier when init is done for sysbus sanity checks */
@@ -1048,6 +1081,32 @@ static char *cpu_slot_to_string(const CPUArchId *cpu)
 return g_string_free(s, false);
 }
 
+static void numa_validate_initiator(NumaState *numa_state)
+{
+int i;
+NodeInfo *numa_info = numa_state->nodes;
+
+for (i = 0; i < numa_state->num_nodes; i++) {
+if (numa_info[i].initiator == MAX_NODES) {
+error_report("The initiator of NUMA node %d is missing, use "
+ "'-numa node,initiator' option to declare it", i);
+exit(1);
+}
+
+if (!numa_info[numa_info[i].initiator].present) {
+error_report("NUMA node %" PRIu16 " is missing, use "
+ "'-numa node' option to declare it first",
+ numa_info[i].initiator);
+exit(1);
+}
+
+if (!numa_info[numa_info[i].initiator].has_cpu) {
+error_report("The initiator of NUMA node %d is invalid", i);
+exit(1);
+}
+}
+}
+
 static void machine_numa_finish_cpu_init(MachineState *machine)
 {
 int i;
@@ -1088,6 +1147,11 @@ static void machine_numa_finish_cpu_init(MachineState 
*machine)
 machine_set_cpu_numa_node(machine, &props, &error_fatal);
 }
 }
+
+if (machine->numa_state->hmat_enabled) {
+numa_validate_initiator(machine->numa_state);
+}
+
 if (s->len && !qtest_enabled()) {
 warn_report("CPU(s) not present in any NUMA nodes: %s",
 s->str);
diff --git a/hw/core/numa.c b/hw/core/nu

[PATCH v18 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT)

2019-11-27 Thread Tao Xu

This series of patches will build Heterogeneous Memory Attribute Table (HMAT)
according to the command line. The ACPI HMAT describes the memory attributes,
such as memory side cache attributes and bandwidth and latency details,
related to the Memory Proximity Domain.
The software is expected to use HMAT information as hint for optimization.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

The V17 patches link:
https://patchwork.kernel.org/cover/11257319/

Changelog:
v18:
- Defer patches 01/14~06/14 of V17, use qapi type uint64 and
  only nanosecond for latency (Markus)
- Rewrite the lines over 80 characters(Igor)
v17:
- Add check when user input latency or bandwidth 0, the
  lb_info_provided should also be 0. Because in ACPI 6.3 5.2.27.4,
  0 means the corresponding latency or bandwidth information is
  not provided.
- Fix the infinite loop when node->latency is 0.
- Use NumaHmatCacheOptions to replace HMAT_Cache_Info (Igor)
- Add check for unordered cache level input (Igor)
- Add some fail test cases (Igor)
v16:
- Add and use qemu_strtold_finite to parse size, support full
  64bit precision, modify related test cases (Eduardo and Markus)
- Simplify struct HMAT_LB_Info and related code, unify latency
  and bandwidth (Igor)
- Add cross check with hmat_lb data (Igor)
- Fields in Cache Attributes are promoted to uint32_t before
  shifting (Igor)
- Add case for QMP build HMAT (Igor)
v15:
- Add a new patch to refactor do_strtosz() (Eduardo)
- Make tests without breaking CI (Michael)
v14:
- Reuse the codes of do_strtosz to build qemu_strtotime_ns
  (Eduardo)
- Squash patch v13 01/12 and 02/12 together (Daniel and Eduardo)
- Drop time unit picosecond (Eric)
- Use qemu ctz64 and clz64 instead of builtin function
v13:
- Modify some text description
- Drop "initiator_valid" field in struct NodeInfo
- Reuse Garray to store the raw bandwidth and bandwidth data
- Calculate common base unit using range bitmap
- Add a patch to alculate hmat latency and bandwidth entry list
- Drop the total_levels option and use readable cache size
- Remove the unnecessary head file
- Use decimal notation with appropriate suffix for cache size
v12:
- Fix a bug that a memory-only node without initiator setting
  doesn't report error. (reported by Danmei Wei)
- Fix a bug that if HMAT is enabled and without hmat-lb setting,
  QEMU will crash. (reported by Danmei Wei)
v11:
- Move numa option patches forward.
- Add num_initiator in Numa_state to record the number of
initiators.
- Simplify struct HMAT_LB_Info, use uint64_t array to store data.
- Drop hmat_get_base().
- Calculate base in build_hmat_lb().

Liu Jingqi (5):
  numa: Extend CLI to provide memory latency and bandwidth information
  numa: Extend CLI to provide memory side cache information
  hmat acpi: Build Memory Proximity Domain Attributes Structure(s)
  hmat acpi: Build System Locality Latency and Bandwidth Information
Structure(s)
  hmat acpi: Build Memory Side Cache Information Structure(s)

Tao Xu (3):
  numa: Extend CLI to provide initiator information for numa nodes
  tests/numa: Add case for QMP build HMAT
  tests/bios-tables-test: add test cases for ACPI HMAT

 hw/acpi/Kconfig   |   7 +-
 hw/acpi/Makefile.objs |   1 +
 hw/acpi/hmat.c| 268 
 hw/acpi/hmat.h|  42 
 hw/core/machine.c |  64 ++
 hw/core/numa.c| 282 ++
 hw/i386/acpi-build.c  |   5 +
 include/sysemu/numa.h |  63 ++
 qapi/machine.json | 178 +++-
 qemu-options.hx   |  94 -
 tests/bios-tables-test-allowed-diff.h |   8 +
 tests/bios-tables-test.c  |  44 
 tests/data/acpi/pc/APIC.acpihmat  |   0
 tests/data/acpi/pc/DSDT.acpihmat  |   0
 tests/data/acpi/pc/HMAT.acpihmat  |   0
 tests/data/acpi/pc/SRAT.acpihmat  |   0
 tests/data/acpi/q35/APIC.acpihmat |   0
 tests/data/acpi/q35/DSDT.acpihmat |   0
 tests/data/acpi/q35/HMAT.acpihmat |   0
 tests/data/acpi/q35/SRAT.acpihmat |   0
 tests/numa-test.c | 197 ++
 21 files changed, 1242 insertions(+), 11 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h
 create mode 100644 tests/data/acpi/pc/APIC.acpihmat
 create mode 100644 tests/data/acpi/pc/DSDT.acpihmat
 create mode 100644 tests/data/acpi/pc/HMAT.acpihmat
 create mode 100644 tests/data/acpi/pc/SRAT.acpihmat
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat
 create mode 100644 tests/data/acpi/q3

[PATCH v18 2/8] numa: Extend CLI to provide memory latency and bandwidth information

2019-11-27 Thread Tao Xu

From: Liu Jingqi 

Add -numa hmat-lb option to provide System Locality Latency and
Bandwidth Information. These memory attributes help to build
System Locality Latency and Bandwidth Information Structure(s)
in ACPI Heterogeneous Memory Attribute Table (HMAT).

Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

Changes in v18:
- Use qapi type uint64 and only nanosecond for latency (Markus)

Changes in v17:
- Add check when user input latency or bandwidth 0, the
  lb_info_provided should also be 0. Because in ACPI 6.3 5.2.27.4,
  0 means the corresponding latency or bandwidth information is
  not provided.
- Fix the infinite loop when node->latency is 0.

Changes in v16:
- Initialize HMAT_LB_Data lb_data (Igor)
- Remove punctuation from error_setg (Igor)
- Correct some description (Igor)
- Drop statement about max value (Igor)
- Simplify struct HMAT_LB_Info and related code, unify latency
  and bandwidth (Igor)

Changes in v15:
- Change the QAPI version tag to 5.0 (Eric)
---
 hw/core/numa.c| 181 ++
 include/sysemu/numa.h |  53 +
 qapi/machine.json |  94 +-
 qemu-options.hx   |  47 ++-
 4 files changed, 372 insertions(+), 3 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index e60da99293..2183c8df1f 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "sysemu/hostmem.h"
 #include "sysemu/numa.h"
 #include "sysemu/sysemu.h"
@@ -198,6 +199,173 @@ void parse_numa_distance(MachineState *ms, 
NumaDistOptions *dist, Error **errp)
 ms->numa_state->have_numa_distance = true;
 }
 
+void parse_numa_hmat_lb(NumaState *numa_state, NumaHmatLBOptions *node,
+Error **errp)
+{
+int i, first_bit, last_bit;
+uint64_t max_entry, temp_base_la;
+NodeInfo *numa_info = numa_state->nodes;
+HMAT_LB_Info *hmat_lb =
+numa_state->hmat_lb[node->hierarchy][node->data_type];
+HMAT_LB_Data lb_data = {};
+HMAT_LB_Data *lb_temp;
+
+/* Error checking */
+if (node->initiator > numa_state->num_nodes) {
+error_setg(errp, "Invalid initiator=%d, it should be less than %d",
+   node->initiator, numa_state->num_nodes);
+return;
+}
+if (node->target > numa_state->num_nodes) {
+error_setg(errp, "Invalid target=%d, it should be less than %d",
+   node->target, numa_state->num_nodes);
+return;
+}
+if (!numa_info[node->initiator].has_cpu) {
+error_setg(errp, "Invalid initiator=%d, it isn't an "
+   "initiator proximity domain", node->initiator);
+return;
+}
+if (!numa_info[node->target].present) {
+error_setg(errp, "The target=%d should point to an existing node",
+   node->target);
+return;
+}
+
+if (!hmat_lb) {
+hmat_lb = g_malloc0(sizeof(*hmat_lb));
+numa_state->hmat_lb[node->hierarchy][node->data_type] = hmat_lb;
+hmat_lb->list = g_array_new(false, true, sizeof(HMAT_LB_Data));
+}
+hmat_lb->hierarchy = node->hierarchy;
+hmat_lb->data_type = node->data_type;
+lb_data.initiator = node->initiator;
+lb_data.target = node->target;
+
+if (node->data_type <= HMATLB_DATA_TYPE_WRITE_LATENCY) {
+/* Input latency data */
+
+if (!node->has_latency) {
+error_setg(errp, "Missing 'latency' option");
+return;
+}
+if (node->has_bandwidth) {
+error_setg(errp, "Invalid option 'bandwidth' since "
+   "the data type is latency");
+return;
+}
+
+/* Detect duplicate configuration */
+for (i = 0; i < hmat_lb->list->len; i++) {
+lb_temp = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
+
+if (node->initiator == lb_temp->initiator &&
+node->target == lb_temp->target) {
+error_setg(errp, "Duplicate configuration of the latency for "
+"initiator=%d and target=%d", node->initiator,
+node->target);
+return;
+}
+}
+
+hmat_lb->base = hmat_lb->base ? hmat_lb->base : UINT64_MAX;
+
+if (node->latency) {
+/* Calculate the temporary base and compressed latency */
+max_entry = node->latency;
+temp_base_la = 1;
+while (QEMU_IS_ALIGNED(max_entry, 10)) {
+max_entry /= 10;
+temp_base_la *= 10;
+}
+
+/* Calculate the max compressed latency */
+hmat_lb->base = MIN(hmat_lb->base, temp_base_la);
+max_entry = node->latency / hmat_lb->base;
+hmat_lb->range_bitmap = MAX(hmat_lb->range_bitmap, max_entry);
+
+/*
+ * For latency hmat_lb->range_bitmap record the max

[PATCH v18 3/8] numa: Extend CLI to provide memory side cache information

2019-11-27 Thread Tao Xu

From: Liu Jingqi 

Add -numa hmat-cache option to provide Memory Side Cache Information.
These memory attributes help to build Memory Side Cache Information
Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

Changes in v18:
- Update the error message (Igor)

Changes in v17:
- Use NumaHmatCacheOptions to replace HMAT_Cache_Info (Igor)
- Add check for unordered cache level input (Igor)

Changes in v16:
- Add cross check with hmat_lb data (Igor)
- Drop total_levels in struct HMAT_Cache_Info (Igor)
- Correct the error table number (Igor)

Changes in v15:
- Change the QAPI version tag to 5.0 (Eric)
---
 hw/core/numa.c| 78 +++
 include/sysemu/numa.h |  5 +++
 qapi/machine.json | 78 +--
 qemu-options.hx   | 16 +++--
 4 files changed, 173 insertions(+), 4 deletions(-)

diff --git a/hw/core/numa.c b/hw/core/numa.c
index 2183c8df1f..3d87fcbf2d 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -366,6 +366,71 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
NumaHmatLBOptions *node,
 g_array_append_val(hmat_lb->list, lb_data);
 }
 
+void parse_numa_hmat_cache(MachineState *ms, NumaHmatCacheOptions *node,
+   Error **errp)
+{
+int nb_numa_nodes = ms->numa_state->num_nodes;
+NodeInfo *numa_info = ms->numa_state->nodes;
+NumaHmatCacheOptions *hmat_cache = NULL;
+
+if (node->node_id >= nb_numa_nodes) {
+error_setg(errp, "Invalid node-id=%" PRIu32 ", it should be less "
+   "than %d", node->node_id, nb_numa_nodes);
+return;
+}
+
+if (numa_info[node->node_id].lb_info_provided != (BIT(0) | BIT(1))) {
+error_setg(errp, "The latency and bandwidth information of "
+   "node-id=%" PRIu32 " should be provided before memory side "
+   "cache attributes", node->node_id);
+return;
+}
+
+if (node->level >= HMAT_LB_LEVELS) {
+error_setg(errp, "Invalid level=%" PRIu8 ", it should be less than or "
+   "equal to %d", node->level, HMAT_LB_LEVELS - 1);
+return;
+}
+assert(node->assoc < HMAT_CACHE_ASSOCIATIVITY__MAX);
+assert(node->policy < HMAT_CACHE_WRITE_POLICY__MAX);
+if (ms->numa_state->hmat_cache[node->node_id][node->level]) {
+error_setg(errp, "Duplicate configuration of the side cache for "
+   "node-id=%" PRIu32 " and level=%" PRIu8,
+   node->node_id, node->level);
+return;
+}
+
+if ((node->level > 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level - 1] &&
+(node->size >=
+ms->numa_state->hmat_cache[node->node_id][node->level - 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be less than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level - 1]->size,
+   node->level - 1);
+return;
+}
+
+if ((node->level < HMAT_LB_LEVELS - 1) &&
+ms->numa_state->hmat_cache[node->node_id][node->level + 1] &&
+(node->size <=
+ms->numa_state->hmat_cache[node->node_id][node->level + 1]->size)) 
{
+error_setg(errp, "Invalid size=%" PRIu64 ", the size of level=%" PRIu8
+   " should be larger than the size(%" PRIu64 ") of "
+   "level=%" PRIu8, node->size, node->level,
+   ms->numa_state->hmat_cache[node->node_id]
+ [node->level + 1]->size,
+   node->level + 1);
+return;
+}
+
+hmat_cache = g_malloc0(sizeof(*hmat_cache));
+memcpy(hmat_cache, node, sizeof(*hmat_cache));
+ms->numa_state->hmat_cache[node->node_id][node->level] = hmat_cache;
+}
+
 void set_numa_options(MachineState *ms, NumaOptions *object, Error **errp)
 {
 Error *err = NULL;
@@ -417,6 +482,19 @@ void set_numa_options(MachineState *ms, NumaOptions 
*object, Error **errp)
 goto end;
 }
 break;
+case NUMA_OPTIONS_TYPE_HMAT_CACHE:
+if (!ms->numa_state->hmat_enabled) {
+error_setg(errp, "ACPI Heterogeneous Memory Attribute Table "
+   "(HMAT) is disabled, enable it with -machine hmat=on "
+   "before using any of hmat specific options");
+return;
+}
+
+parse_numa_hmat_cache(ms, &object->u.hmat_cache, &err);
+if (err) {
+goto end;
+}
+break;
 default:
 abort();
 }
diff --git a/include/sysemu/numa.h b/include/sysemu/numa.h
index 70f93c83d7..ba693cc80b 100644
--- a/include

[PATCH v18 4/8] hmat acpi: Build Memory Proximity Domain Attributes Structure(s)

2019-11-27 Thread Tao Xu

From: Liu Jingqi 

HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
(HMAT). The specification references below link:
http://www.uefi.org/sites/default/files/resources/ACPI_6_3_final_Jan30.pdf

It describes the memory attributes, such as memory side cache
attributes and bandwidth and latency details, related to the
Memory Proximity Domain. The software is
expected to use this information as hint for optimization.

This structure describes Memory Proximity Domain Attributes by memory
subsystem and its associativity with processor proximity domain as well as
hint for memory usage.

In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
the platform's HMAT tables.

Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v18.

Changes in v16:
- Use uint32_t for initiator and mem_node

Changes in v13:
- Remove the unnecessary head file.
---
 hw/acpi/Kconfig   |  7 ++-
 hw/acpi/Makefile.objs |  1 +
 hw/acpi/hmat.c| 99 +++
 hw/acpi/hmat.h| 42 ++
 hw/i386/acpi-build.c  |  5 +++
 5 files changed, 152 insertions(+), 2 deletions(-)
 create mode 100644 hw/acpi/hmat.c
 create mode 100644 hw/acpi/hmat.h

diff --git a/hw/acpi/Kconfig b/hw/acpi/Kconfig
index 12e3f1e86e..54209c6f2f 100644
--- a/hw/acpi/Kconfig
+++ b/hw/acpi/Kconfig
@@ -7,6 +7,7 @@ config ACPI_X86
 select ACPI_NVDIMM
 select ACPI_CPU_HOTPLUG
 select ACPI_MEMORY_HOTPLUG
+select ACPI_HMAT
 
 config ACPI_X86_ICH
 bool
@@ -23,6 +24,10 @@ config ACPI_NVDIMM
 bool
 depends on ACPI
 
+config ACPI_HMAT
+bool
+depends on ACPI
+
 config ACPI_PCI
 bool
 depends on ACPI && PCI
@@ -33,5 +38,3 @@ config ACPI_VMGENID
 depends on PC
 
 config ACPI_HW_REDUCED
-bool
-depends on ACPI
diff --git a/hw/acpi/Makefile.objs b/hw/acpi/Makefile.objs
index 655a9c1973..517bd88704 100644
--- a/hw/acpi/Makefile.objs
+++ b/hw/acpi/Makefile.objs
@@ -7,6 +7,7 @@ common-obj-$(CONFIG_ACPI_CPU_HOTPLUG) += cpu.o
 common-obj-$(CONFIG_ACPI_NVDIMM) += nvdimm.o
 common-obj-$(CONFIG_ACPI_VMGENID) += vmgenid.o
 common-obj-$(CONFIG_ACPI_HW_REDUCED) += generic_event_device.o
+common-obj-$(CONFIG_ACPI_HMAT) += hmat.o
 common-obj-$(call lnot,$(CONFIG_ACPI_X86)) += acpi-stub.o
 
 common-obj-y += acpi_interface.o
diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
new file mode 100644
index 00..9ff79308a4
--- /dev/null
+++ b/hw/acpi/hmat.c
@@ -0,0 +1,99 @@
+/*
+ * HMAT ACPI Implementation
+ *
+ * Copyright(C) 2019 Intel Corporation.
+ *
+ * Author:
+ *  Liu jingqi 
+ *  Tao Xu 
+ *
+ * HMAT is defined in ACPI 6.3: 5.2.27 Heterogeneous Memory Attribute Table
+ * (HMAT)
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see 
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/numa.h"
+#include "hw/acpi/hmat.h"
+
+/*
+ * ACPI 6.3:
+ * 5.2.27.3 Memory Proximity Domain Attributes Structure: Table 5-145
+ */
+static void build_hmat_mpda(GArray *table_data, uint16_t flags,
+uint32_t initiator, uint32_t mem_node)
+{
+
+/* Memory Proximity Domain Attributes Structure */
+/* Type */
+build_append_int_noprefix(table_data, 0, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 40, 4);
+/* Flags */
+build_append_int_noprefix(table_data, flags, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Proximity Domain for the Attached Initiator */
+build_append_int_noprefix(table_data, initiator, 4);
+/* Proximity Domain for the Memory */
+build_append_int_noprefix(table_data, mem_node, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/*
+ * Reserved:
+ * Previously defined as the Start Address of the System Physical
+ * Address Range. Deprecated since ACPI Spec 6.3.
+ */
+build_append_int_noprefix(table_data, 0, 8);
+/*
+ * Reserved:
+ * Previously defined as the Range Length of the region in bytes.
+ * Deprecated since ACPI Spec 6.3.
+ */
+build_append_int_noprefix(table_data, 0, 8);
+}
+
+/* Build HMAT sub table structures */
+static void hmat_build_table_structs(GArray *table_data, NumaStat

[PATCH v18 5/8] hmat acpi: Build System Locality Latency and Bandwidth Information Structure(s)

2019-11-27 Thread Tao Xu

From: Liu Jingqi 

This structure describes the memory access latency and bandwidth
information from various memory access initiator proximity domains.
The latency and bandwidth numbers represented in this structure
correspond to rated latency and bandwidth for the platform.
The software could use this information as hint for optimization.

Reviewed-by: Igor Mammedov 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in 18.

Changes in v17:
- Remove unnecessary header file (Igor)

Changes in v16:
- Add more description for lb_length (Igor)
- Drop entry_list and calculate entries in this patch (Igor)

Changes in v13:
- Calculate the entries in a new patch.
---
 hw/acpi/hmat.c | 104 -
 1 file changed, 103 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index 9ff79308a4..e5ee8b4317 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -25,6 +25,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/units.h"
 #include "sysemu/numa.h"
 #include "hw/acpi/hmat.h"
 
@@ -67,11 +68,89 @@ static void build_hmat_mpda(GArray *table_data, uint16_t 
flags,
 build_append_int_noprefix(table_data, 0, 8);
 }
 
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+static void build_hmat_lb(GArray *table_data, HMAT_LB_Info *hmat_lb,
+  uint32_t num_initiator, uint32_t num_target,
+  uint32_t *initiator_list)
+{
+int i, index;
+HMAT_LB_Data *lb_data;
+uint16_t *entry_list;
+uint32_t base;
+/* Length in bytes for entire structure */
+uint32_t lb_length
+= 32 /* Table length upto and including Entry Base Unit */
++ 4 * num_initiator /* Initiator Proximity Domain List */
++ 4 * num_target /* Target Proximity Domain List */
++ 2 * num_initiator * num_target; /* Latency or Bandwidth Entries */
+
+/* Type */
+build_append_int_noprefix(table_data, 1, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, lb_length, 4);
+/* Flags: Bits [3:0] Memory Hierarchy, Bits[7:4] Reserved */
+assert(!(hmat_lb->hierarchy >> 4));
+build_append_int_noprefix(table_data, hmat_lb->hierarchy, 1);
+/* Data Type */
+build_append_int_noprefix(table_data, hmat_lb->data_type, 1);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Number of Initiator Proximity Domains (s) */
+build_append_int_noprefix(table_data, num_initiator, 4);
+/* Number of Target Proximity Domains (t) */
+build_append_int_noprefix(table_data, num_target, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+
+/* Entry Base Unit */
+if (hmat_lb->data_type <= HMAT_LB_DATA_WRITE_LATENCY) {
+/* Convert latency base from nanoseconds to picosecond */
+base = hmat_lb->base * 1000;
+} else {
+/* Convert bandwidth base from Byte to Megabyte */
+base = hmat_lb->base / MiB;
+}
+build_append_int_noprefix(table_data, base, 8);
+
+/* Initiator Proximity Domain List */
+for (i = 0; i < num_initiator; i++) {
+build_append_int_noprefix(table_data, initiator_list[i], 4);
+}
+
+/* Target Proximity Domain List */
+for (i = 0; i < num_target; i++) {
+build_append_int_noprefix(table_data, i, 4);
+}
+
+/* Latency or Bandwidth Entries */
+entry_list = g_malloc0(hmat_lb->list->len * sizeof(uint16_t));
+for (i = 0; i < hmat_lb->list->len; i++) {
+lb_data = &g_array_index(hmat_lb->list, HMAT_LB_Data, i);
+index = lb_data->initiator * num_target + lb_data->target;
+
+entry_list[index] = (uint16_t)(lb_data->data / hmat_lb->base);
+}
+
+for (i = 0; i < num_initiator * num_target; i++) {
+build_append_int_noprefix(table_data, entry_list[i], 2);
+}
+
+g_free(entry_list);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
 uint16_t flags;
-int i;
+uint32_t num_initiator = 0;
+uint32_t initiator_list[MAX_NODES];
+int i, hierarchy, type;
+HMAT_LB_Info *hmat_lb;
 
 for (i = 0; i < numa_state->num_nodes; i++) {
 flags = 0;
@@ -82,6 +161,29 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *numa_state)
 
 build_hmat_mpda(table_data, flags, numa_state->nodes[i].initiator, i);
 }
+
+for (i = 0; i < numa_state->num_nodes; i++) {
+if (numa_state->nodes[i].has_cpu) {
+initiator_list[num_initiator++] = i;
+}
+}
+
+/*
+ * ACPI 6.3: 5.2.27.4 System Locality Latency and Bandwidth Information
+ * Structure: Table 5-146
+ */
+for (hierarchy = HMAT_LB_MEM_MEMORY;
+ hierarchy <= HMAT_LB_MEM_CACHE_3RD_LEVEL; hierarchy++) {
+for (type = HMAT_LB_DATA

[PATCH v18 7/8] tests/numa: Add case for QMP build HMAT

2019-11-27 Thread Tao Xu

Check configuring HMAT usecase

Reviewed-by: Igor Mammedov 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
---

Changes in v18:
- Rewrite the lines over 80 characters

Chenges in v17:
- Add some fail test cases (Igor)
---
 tests/numa-test.c | 197 ++
 1 file changed, 197 insertions(+)

diff --git a/tests/numa-test.c b/tests/numa-test.c
index 8de8581231..9cc6c5189e 100644
--- a/tests/numa-test.c
+++ b/tests/numa-test.c
@@ -327,6 +327,200 @@ static void pc_dynamic_cpu_cfg(const void *data)
 qtest_quit(qs);
 }
 
+static void pc_hmat_build_cfg(const void *data)
+{
+QTestState *qs = qtest_initf("%s -nodefaults --preconfig -machine hmat=on "
+ "-smp 2,sockets=2 "
+ "-m 128M,slots=2,maxmem=1G "
+ "-object memory-backend-ram,size=64M,id=m0 "
+ "-object memory-backend-ram,size=64M,id=m1 "
+ "-numa node,nodeid=0,memdev=m0 "
+ "-numa node,nodeid=1,memdev=m1,initiator=0 "
+ "-numa cpu,node-id=0,socket-id=0 "
+ "-numa cpu,node-id=0,socket-id=1",
+ data ? (char *)data : "");
+
+/* Fail: Initiator should be less than the number of nodes */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 2, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));
+
+/* Fail: Target should be less than the number of nodes */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 2,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));
+
+/* Fail: Initiator should contain cpu */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 1, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\" } }")));
+
+/* Fail: Data-type mismatch */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"write-latency\","
+" 'bandwidth': 524288000 } }")));
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"read-bandwidth\","
+" 'latency': 5 } }")));
+
+/* Fail: Bandwidth should be 1MB (1048576) aligned */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+" 'bandwidth': 1048575 } }")));
+
+/* Configuring HMAT bandwidth and latency details */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+" 'latency': 1 } }")));/* 1 ns */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+" 'latency': 5 } }")));/* Fail: Duplicate configuration */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 0,"
+" 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+" 'bandwidth': 68717379584 } }")));/* 65534 MB/s */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
+" 'hierarchy': \"memory\", 'data-type': \"access-latency\","
+" 'latency': 65534 } }")));/* 65534 ns */
+g_assert(!qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-lb', 'initiator': 0, 'target': 1,"
+" 'hierarchy': \"memory\", 'data-type': \"access-bandwidth\","
+" 'bandwidth': 34358689792 } }")));/* 32767 MB/s */
+
+/* Fail: node_id should be less than the number of nodes */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-cache', 'node-id': 2, 'size': 10240,"
+" 'level': 1, 'assoc': \"direct\", 'policy': \"write-back\","
+" 'line': 8 } }")));
+
+/* Fail: level should be less than HMAT_LB_LEVELS (4) */
+g_assert(qmp_rsp_is_err(qtest_qmp(qs, "{ 'execute': 'set-numa-node',"
+" 'arguments': { 'type': 'hmat-cache', 'node-id': 0, 'size': 10240,"
+" 'level': 4, 'asso

[PATCH v18 6/8] hmat acpi: Build Memory Side Cache Information Structure(s)

2019-11-27 Thread Tao Xu

From: Liu Jingqi 

This structure describes memory side cache information for memory
proximity domains if the memory side cache is present and the
physical device forms the memory side cache.
The software could use this information to effectively place
the data in memory to maximize the performance of the system
memory that use the memory side cache.

Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Reviewed-by: Jonathan Cameron 
Signed-off-by: Liu Jingqi 
Signed-off-by: Tao Xu 
---

No changes in v18.

Changes in v16:
- Use checks and assert to replace masks (Igor)
- Fields in Cache Attributes are promoted to uint32_t before
  shifting (Igor)
- Drop cpu_to_le32() (Igor)

Changes in v13:
- rename level as cache_level
---
 hw/acpi/hmat.c | 69 +-
 1 file changed, 68 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/hmat.c b/hw/acpi/hmat.c
index e5ee8b4317..bb6adb0ccf 100644
--- a/hw/acpi/hmat.c
+++ b/hw/acpi/hmat.c
@@ -143,14 +143,62 @@ static void build_hmat_lb(GArray *table_data, 
HMAT_LB_Info *hmat_lb,
 g_free(entry_list);
 }
 
+/* ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure: Table 5-147 */
+static void build_hmat_cache(GArray *table_data, uint8_t total_levels,
+ NumaHmatCacheOptions *hmat_cache)
+{
+/*
+ * Cache Attributes: Bits [3:0] – Total Cache Levels
+ * for this Memory Proximity Domain
+ */
+uint32_t cache_attr = total_levels;
+
+/* Bits [7:4] : Cache Level described in this structure */
+cache_attr |= (uint32_t) hmat_cache->level << 4;
+
+/* Bits [11:8] - Cache Associativity */
+cache_attr |= (uint32_t) hmat_cache->assoc << 8;
+
+/* Bits [15:12] - Write Policy */
+cache_attr |= (uint32_t) hmat_cache->policy << 12;
+
+/* Bits [31:16] - Cache Line size in bytes */
+cache_attr |= (uint32_t) hmat_cache->line << 16;
+
+/* Type */
+build_append_int_noprefix(table_data, 2, 2);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/* Length */
+build_append_int_noprefix(table_data, 32, 4);
+/* Proximity Domain for the Memory */
+build_append_int_noprefix(table_data, hmat_cache->node_id, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 4);
+/* Memory Side Cache Size */
+build_append_int_noprefix(table_data, hmat_cache->size, 8);
+/* Cache Attributes */
+build_append_int_noprefix(table_data, cache_attr, 4);
+/* Reserved */
+build_append_int_noprefix(table_data, 0, 2);
+/*
+ * Number of SMBIOS handles (n)
+ * Linux kernel uses Memory Side Cache Information Structure
+ * without SMBIOS entries for now, so set Number of SMBIOS handles
+ * as 0.
+ */
+build_append_int_noprefix(table_data, 0, 2);
+}
+
 /* Build HMAT sub table structures */
 static void hmat_build_table_structs(GArray *table_data, NumaState *numa_state)
 {
 uint16_t flags;
 uint32_t num_initiator = 0;
 uint32_t initiator_list[MAX_NODES];
-int i, hierarchy, type;
+int i, hierarchy, type, cache_level, total_levels;
 HMAT_LB_Info *hmat_lb;
+NumaHmatCacheOptions *hmat_cache;
 
 for (i = 0; i < numa_state->num_nodes; i++) {
 flags = 0;
@@ -184,6 +232,25 @@ static void hmat_build_table_structs(GArray *table_data, 
NumaState *numa_state)
 }
 }
 }
+
+/*
+ * ACPI 6.3: 5.2.27.5 Memory Side Cache Information Structure:
+ * Table 5-147
+ */
+for (i = 0; i < numa_state->num_nodes; i++) {
+total_levels = 0;
+for (cache_level = 1; cache_level < HMAT_LB_LEVELS; cache_level++) {
+if (numa_state->hmat_cache[i][cache_level]) {
+total_levels++;
+}
+}
+for (cache_level = 0; cache_level <= total_levels; cache_level++) {
+hmat_cache = numa_state->hmat_cache[i][cache_level];
+if (hmat_cache) {
+build_hmat_cache(table_data, total_levels, hmat_cache);
+}
+}
+}
 }
 
 void build_hmat(GArray *table_data, BIOSLinker *linker, NumaState *numa_state)
-- 
2.20.1

[PATCH v18 8/8] tests/bios-tables-test: add test cases for ACPI HMAT

2019-11-27 Thread Tao Xu

ACPI table HMAT has been introduced, QEMU now builds HMAT tables for
Heterogeneous Memory with boot option '-numa node'.

Add test cases on PC and Q35 machines with 2 numa nodes.
Because HMAT is generated when system enable numa, the
following tables need to be added for this test:
tests/data/acpi/pc/APIC.acpihmat
tests/data/acpi/pc/SRAT.acpihmat
tests/data/acpi/pc/HMAT.acpihmat
tests/data/acpi/pc/DSDT.acpihmat
tests/data/acpi/q35/APIC.acpihmat
tests/data/acpi/q35/SRAT.acpihmat
tests/data/acpi/q35/HMAT.acpihmat
tests/data/acpi/q35/DSDT.acpihmat

Reviewed-by: Igor Mammedov 
Reviewed-by: Daniel Black 
Reviewed-by: Jingqi Liu 
Suggested-by: Igor Mammedov 
Signed-off-by: Tao Xu 
---

Changes in v18:
- Remove unit "ns".

Changes in v17:
- Update the latency and bandwidth

Changes in v15:
- Make tests without breaking CI (Michael)

Changes in v13:
- Use decimal notation with appropriate suffix for cache size
---
 tests/bios-tables-test-allowed-diff.h |  8 +
 tests/bios-tables-test.c  | 44 +++
 tests/data/acpi/pc/APIC.acpihmat  |  0
 tests/data/acpi/pc/DSDT.acpihmat  |  0
 tests/data/acpi/pc/HMAT.acpihmat  |  0
 tests/data/acpi/pc/SRAT.acpihmat  |  0
 tests/data/acpi/q35/APIC.acpihmat |  0
 tests/data/acpi/q35/DSDT.acpihmat |  0
 tests/data/acpi/q35/HMAT.acpihmat |  0
 tests/data/acpi/q35/SRAT.acpihmat |  0
 10 files changed, 52 insertions(+)
 create mode 100644 tests/data/acpi/pc/APIC.acpihmat
 create mode 100644 tests/data/acpi/pc/DSDT.acpihmat
 create mode 100644 tests/data/acpi/pc/HMAT.acpihmat
 create mode 100644 tests/data/acpi/pc/SRAT.acpihmat
 create mode 100644 tests/data/acpi/q35/APIC.acpihmat
 create mode 100644 tests/data/acpi/q35/DSDT.acpihmat
 create mode 100644 tests/data/acpi/q35/HMAT.acpihmat
 create mode 100644 tests/data/acpi/q35/SRAT.acpihmat

diff --git a/tests/bios-tables-test-allowed-diff.h 
b/tests/bios-tables-test-allowed-diff.h
index dfb8523c8b..3c9e0c979b 100644
--- a/tests/bios-tables-test-allowed-diff.h
+++ b/tests/bios-tables-test-allowed-diff.h
@@ -1 +1,9 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/pc/APIC.acpihmat",
+"tests/data/acpi/pc/SRAT.acpihmat",
+"tests/data/acpi/pc/HMAT.acpihmat",
+"tests/data/acpi/pc/DSDT.acpihmat",
+"tests/data/acpi/q35/APIC.acpihmat",
+"tests/data/acpi/q35/SRAT.acpihmat",
+"tests/data/acpi/q35/HMAT.acpihmat",
+"tests/data/acpi/q35/DSDT.acpihmat",
diff --git a/tests/bios-tables-test.c b/tests/bios-tables-test.c
index 79f5da092f..cb1de58053 100644
--- a/tests/bios-tables-test.c
+++ b/tests/bios-tables-test.c
@@ -947,6 +947,48 @@ static void test_acpi_virt_tcg_numamem(void)
 
 }
 
+static void test_acpi_tcg_acpi_hmat(const char *machine)
+{
+test_data data;
+
+memset(&data, 0, sizeof(data));
+data.machine = machine;
+data.variant = ".acpihmat";
+test_acpi_one(" -machine hmat=on"
+  " -smp 2,sockets=2"
+  " -m 128M,slots=2,maxmem=1G"
+  " -object memory-backend-ram,size=64M,id=m0"
+  " -object memory-backend-ram,size=64M,id=m1"
+  " -numa node,nodeid=0,memdev=m0"
+  " -numa node,nodeid=1,memdev=m1,initiator=0"
+  " -numa cpu,node-id=0,socket-id=0"
+  " -numa cpu,node-id=0,socket-id=1"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-latency,latency=1"
+  " -numa hmat-lb,initiator=0,target=0,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=65534M"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-latency,latency=65534"
+  " -numa hmat-lb,initiator=0,target=1,hierarchy=memory,"
+  "data-type=access-bandwidth,bandwidth=32767M"
+  " -numa hmat-cache,node-id=0,size=10K,level=1,assoc=direct,"
+  "policy=write-back,line=8"
+  " -numa hmat-cache,node-id=1,size=10K,level=1,assoc=direct,"
+  "policy=write-back,line=8",
+  &data);
+free_test_data(&data);
+}
+
+static void test_acpi_q35_tcg_acpi_hmat(void)
+{
+test_acpi_tcg_acpi_hmat(MACHINE_Q35);
+}
+
+static void test_acpi_piix4_tcg_acpi_hmat(void)
+{
+test_acpi_tcg_acpi_hmat(MACHINE_PC);
+}
+
 static void test_acpi_virt_tcg(void)
 {
 test_data data = {
@@ -991,6 +1033,8 @@ int main(int argc, char *argv[])
 qtest_add_func("acpi/q35/numamem", test_acpi_q35_tcg_numamem);
 qtest_add_func("acpi/piix4/dimmpxm", test_acpi_piix4_tcg_dimm_pxm);
 qtest_add_func("acpi/q35/dimmpxm", test_acpi_q35_tcg_dimm_pxm);
+qtest_add_func("acpi/piix4/acpihmat", test_acpi_piix4_tcg_acpi_hmat);
+qtest_add_func("acpi/q35/acpihmat", test_acpi_q35_tcg_acpi_hmat);
 } else if (strcmp(arch, "aarch64") == 0) {

[PATCH v3 3/7] gpiolib: Add support for GPIO line table lookup

2019-11-27 Thread Geert Uytterhoeven

Currently GPIOs can only be referred to by GPIO controller and offset in
GPIO lookup tables.

Add support for looking them up by line name.

Signed-off-by: Geert Uytterhoeven 
---
If this is rejected, the GPIO Aggregator documentation and code must be
updated.

v3:
  - New.
---
 drivers/gpio/gpiolib.c   | 12 
 include/linux/gpio/machine.h |  2 +-
 2 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index d24a3d79dcfe69ad..cb608512ad6bbded 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -4475,6 +4475,18 @@ static struct gpio_desc *gpiod_find(struct device *dev, 
const char *con_id,
if (p->con_id && (!con_id || strcmp(p->con_id, con_id)))
continue;
 
+   if (p->chip_hwnum == (u16)-1) {
+   desc = gpio_name_to_desc(p->chip_label);
+   if (desc) {
+   *flags = p->flags;
+   return desc;
+   }
+
+   dev_warn(dev, "cannot find GPIO line %s, deferring\n",
+p->chip_label);
+   return ERR_PTR(-EPROBE_DEFER);
+   }
+
chip = find_chip_by_name(p->chip_label);
 
if (!chip) {
diff --git a/include/linux/gpio/machine.h b/include/linux/gpio/machine.h
index 1ebe5be05d5f81fa..84c1c097e55eefaf 100644
--- a/include/linux/gpio/machine.h
+++ b/include/linux/gpio/machine.h
@@ -31,7 +31,7 @@ enum gpio_lookup_flags {
  */
 struct gpiod_lookup {
const char *chip_label;
-   u16 chip_hwnum;
+   u16 chip_hwnum; /* if -1, chip_label is named line */
const char *con_id;
unsigned int idx;
unsigned long flags;
-- 
2.17.1

[PATCH v3 7/7] MAINTAINERS: Add GPIO Aggregator/Repeater section

2019-11-27 Thread Geert Uytterhoeven

Add a maintainership section for the GPIO Aggregator/Repeater, covering
documentation, Device Tree bindings, and driver source code.

Signed-off-by: Geert Uytterhoeven 
---
Harish: Do you want to be listed as maintainer, too?

v3:
  - New.
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index e5949b6827b72f2b..0f12ebdaa8faa76b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7043,6 +7043,14 @@ S:   Maintained
 F: Documentation/firmware-guide/acpi/gpio-properties.rst
 F: drivers/gpio/gpiolib-acpi.c
 
+GPIO AGGREGATOR/REPEATER
+M: Geert Uytterhoeven 
+L: linux-g...@vger.kernel.org
+S: Maintained
+F: Documentation/admin-guide/gpio/gpio-aggregator.rst
+F: Documentation/devicetree/bindings/gpio/gpio-repeater.yaml
+F: drivers/gpio/gpio-aggregator.c
+
 GPIO IR Transmitter
 M: Sean Young 
 L: linux-me...@vger.kernel.org
-- 
2.17.1

[PATCH v3 2/7] gpiolib: Add support for gpiochipN-based table lookup

2019-11-27 Thread Geert Uytterhoeven

Currently GPIO controllers can only be referred to by label in GPIO
lookup tables.

Add support for looking them up by "gpiochipN" name, with "N" either the
corresponding GPIO device's ID number, or the GPIO controller's first
GPIO number.

Signed-off-by: Geert Uytterhoeven 
---
If this is rejected, the GPIO Aggregator documentation must be updated.

The second variant is currently used by the legacy sysfs interface only,
so perhaps the chip->base check should be dropped?

v3:
  - New.
---
 drivers/gpio/gpiolib.c | 22 +-
 1 file changed, 21 insertions(+), 1 deletion(-)

diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index c9e47620d2434983..d24a3d79dcfe69ad 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1746,9 +1746,29 @@ static int gpiochip_match_name(struct gpio_chip *chip, 
void *data)
return !strcmp(chip->label, name);
 }
 
+static int gpiochip_match_id(struct gpio_chip *chip, void *data)
+{
+   int id = (uintptr_t)data;
+
+   return id == chip->base || id == chip->gpiodev->id;
+}
+
 static struct gpio_chip *find_chip_by_name(const char *name)
 {
-   return gpiochip_find((void *)name, gpiochip_match_name);
+   struct gpio_chip *chip;
+   int id;
+
+   chip = gpiochip_find((void *)name, gpiochip_match_name);
+   if (chip)
+   return chip;
+
+   if (!str_has_prefix(name, GPIOCHIP_NAME))
+   return NULL;
+
+   if (kstrtoint(name + strlen(GPIOCHIP_NAME), 10, &id))
+   return NULL;
+
+   return gpiochip_find((void *)(uintptr_t)id, gpiochip_match_id);
 }
 
 #ifdef CONFIG_GPIOLIB_IRQCHIP
-- 
2.17.1

Re: [PATCH 9/9] monitor/hmp: Prefer to use hmp_handle_error for error reporting in block hmp commands

2019-11-27 Thread Markus Armbruster

Title is too long.  blockdev-hmp-cmds.c will become
block/monitor/block-hmp-cmds.c in v2.  With this in mind, suggest

block/monitor: Prefer to use hmp_handle_error() to report HMP errors

Maxim Levitsky  writes:

> This way they all will be prefixed with 'Error:' which some parsers
> (e.g libvirt need)

Sadly, "all" is far from true.  Consider

void hmp_drive_add(Monitor *mon, const QDict *qdict)
{
Error *err = NULL;
DriveInfo *dinfo = NULL;
QemuOpts *opts;
MachineClass *mc;
const char *optstr = qdict_get_str(qdict, "opts");
bool node = qdict_get_try_bool(qdict, "node", false);

if (node) {
hmp_drive_add_node(mon, optstr);
return;
}

opts = drive_def(optstr);
if (!opts)
return;

hmp_drive_add_node() uses error_report() and error_report_err().  Easy
enough to fix if you move the function here, as I suggested in my review
of PATCH 8.

drive_def() is a wrapper around qemu_opts_parse_noisily(), which uses
error_report_err().  You can't change qemu_opts_parse_noisily() to use
hmp_handle_error().  You'd have to convert drive_def() to Error, which
involves switching it to qemu_opts_parse() + qemu_opts_print_help().

These are just the first two error paths in this file.  There's much
more.  Truly routing all HMP errors through hmp_handle_error() takes a
*massive* Error conversion effort, with a high risk of missing Error
conversions, followed by a never-ending risk of non-Error stuff creeping
in.

There must be an easier way.

Consider vreport():

switch (type) {
case REPORT_TYPE_ERROR:
break;
case REPORT_TYPE_WARNING:
error_printf("warning: ");
break;
case REPORT_TYPE_INFO:
error_printf("info: ");
break;
}

Adding the prefix here (either unconditionally, or if cur_mon) covers
all HMP errors reported with error_report() & friends in one blow.

That leaves the ones that are still reported with monitor_printf().
Converting those to error_report() looks far more tractable to me.

> Signed-off-by: Maxim Levitsky 
> ---
>  blockdev-hmp-cmds.c | 35 +--
>  1 file changed, 21 insertions(+), 14 deletions(-)
>
> diff --git a/blockdev-hmp-cmds.c b/blockdev-hmp-cmds.c
> index c943dccd03..197994716f 100644
> --- a/blockdev-hmp-cmds.c
> +++ b/blockdev-hmp-cmds.c
> @@ -59,7 +59,6 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict)
>  mc = MACHINE_GET_CLASS(current_machine);
>  dinfo = drive_new(opts, mc->block_default_type, &err);
>  if (err) {
> -error_report_err(err);
>  qemu_opts_del(opts);
>  goto err;
>  }
> @@ -73,7 +72,7 @@ void hmp_drive_add(Monitor *mon, const QDict *qdict)
>  monitor_printf(mon, "OK\n");
>  break;
>  default:
> -monitor_printf(mon, "Can't hot-add drive to type %d\n", dinfo->type);
> +error_setg(&err, "Can't hot-add drive to type %d", dinfo->type);
>  goto err;
>  }
>  return;
> @@ -84,6 +83,7 @@ err:
>  monitor_remove_blk(blk);
>  blk_unref(blk);
>  }
> +hmp_handle_error(mon, &err);
>  }
>  
>  void hmp_drive_del(Monitor *mon, const QDict *qdict)
> @@ -105,14 +105,14 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
>  
>  blk = blk_by_name(id);
>  if (!blk) {
> -error_report("Device '%s' not found", id);
> -return;
> +error_setg(&local_err, "Device '%s' not found", id);
> +goto err;

Having to create Error objects just so we can use hmp_handle_error() is
awkward.  Tolerable if using hmp_handle_error() improves matters.  I'm
not sure it does.

>  }
>  
>  if (!blk_legacy_dinfo(blk)) {
> -error_report("Deleting device added with blockdev-add"
> - " is not supported");
> -return;
> +error_setg(&local_err,
> +   "Deleting device added with blockdev-add is not 
> supported");
> +goto err;
>  }
>  
>  aio_context = blk_get_aio_context(blk);
> @@ -121,9 +121,8 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
>  bs = blk_bs(blk);
>  if (bs) {
>  if (bdrv_op_is_blocked(bs, BLOCK_OP_TYPE_DRIVE_DEL, &local_err)) {
> -error_report_err(local_err);
>  aio_context_release(aio_context);
> -return;
> +goto err;
>  }
>  
>  blk_remove_bs(blk);
> @@ -144,12 +143,15 @@ void hmp_drive_del(Monitor *mon, const QDict *qdict)
>  }
>  
>  aio_context_release(aio_context);
> +err:
> +hmp_handle_error(mon, &local_err);
>  }
>  
>  void hmp_commit(Monitor *mon, const QDict *qdict)
>  {
>  const char *device = qdict_get_str(qdict, "device");
>  BlockBackend *blk;
> +Error *local_err = NULL;
>  int ret;
>  
>  if (!strcmp(device, "all")) {
> @@ -160,12 +162,12 @@ void hmp_commit(Monitor *mon, const QDict *qdict)
>  
>  blk = blk_by_name(device

[PATCH v3 1/7] gpiolib: Add GPIOCHIP_NAME definition

2019-11-27 Thread Geert Uytterhoeven

The string literal "gpiochip" is used in several places.
Add a definition for it, and use it everywhere, to make sure everything
stays in sync.

Signed-off-by: Geert Uytterhoeven 
---
v3:
  - New.
---
 drivers/gpio/gpiolib-sysfs.c | 7 +++
 drivers/gpio/gpiolib.c   | 4 ++--
 drivers/gpio/gpiolib.h   | 2 ++
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/gpio/gpiolib-sysfs.c b/drivers/gpio/gpiolib-sysfs.c
index fbf6b1a0a4fae6ce..23e3d335cd543d53 100644
--- a/drivers/gpio/gpiolib-sysfs.c
+++ b/drivers/gpio/gpiolib-sysfs.c
@@ -762,10 +762,9 @@ int gpiochip_sysfs_register(struct gpio_device *gdev)
parent = &gdev->dev;
 
/* use chip->base for the ID; it's already known to be unique */
-   dev = device_create_with_groups(&gpio_class, parent,
-   MKDEV(0, 0),
-   chip, gpiochip_groups,
-   "gpiochip%d", chip->base);
+   dev = device_create_with_groups(&gpio_class, parent, MKDEV(0, 0), chip,
+   gpiochip_groups, GPIOCHIP_NAME "%d",
+   chip->base);
if (IS_ERR(dev))
return PTR_ERR(dev);
 
diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c
index dce0b31f4125a6b3..c9e47620d2434983 100644
--- a/drivers/gpio/gpiolib.c
+++ b/drivers/gpio/gpiolib.c
@@ -1419,7 +1419,7 @@ int gpiochip_add_data_with_key(struct gpio_chip *chip, 
void *data,
ret = gdev->id;
goto err_free_gdev;
}
-   dev_set_name(&gdev->dev, "gpiochip%d", gdev->id);
+   dev_set_name(&gdev->dev, GPIOCHIP_NAME "%d", gdev->id);
device_initialize(&gdev->dev);
dev_set_drvdata(&gdev->dev, gdev);
if (chip->parent && chip->parent->driver)
@@ -5105,7 +5105,7 @@ static int __init gpiolib_dev_init(void)
return ret;
}
 
-   ret = alloc_chrdev_region(&gpio_devt, 0, GPIO_DEV_MAX, "gpiochip");
+   ret = alloc_chrdev_region(&gpio_devt, 0, GPIO_DEV_MAX, GPIOCHIP_NAME);
if (ret < 0) {
pr_err("gpiolib: failed to allocate char dev region\n");
bus_unregister(&gpio_bus_type);
diff --git a/drivers/gpio/gpiolib.h b/drivers/gpio/gpiolib.h
index ca9bc1e4803c2979..a4a759920faa48ab 100644
--- a/drivers/gpio/gpiolib.h
+++ b/drivers/gpio/gpiolib.h
@@ -16,6 +16,8 @@
 #include 
 #include 
 
+#define GPIOCHIP_NAME  "gpiochip"
+
 /**
  * struct gpio_device - internal state container for GPIO devices
  * @id: numerical ID number for the GPIO chip
-- 
2.17.1

[PATCH v3 4/7] dt-bindings: gpio: Add gpio-repeater bindings

2019-11-27 Thread Geert Uytterhoeven

Add Device Tree bindings for a GPIO repeater, with optional translation
of physical signal properties.  This is useful for describing explicitly
the presence of e.g. an inverter on a GPIO line, and was inspired by the
non-YAML gpio-inverter bindings by Harish Jenny K N
[1].

Note that this is different from a GPIO Nexus Node[2], which cannot do
physical signal property translation.

While an inverter can be described implicitly by exchanging the
GPIO_ACTIVE_HIGH and GPIO_ACTIVE_LOW flags, this has its limitations.
Each GPIO line has only a single GPIO_ACTIVE_* flag, but applies to both
th provider and consumer sides:
  1. The GPIO provider (controller) looks at the flags to know the
 polarity, so it can translate between logical (active/not active)
 and physical (high/low) signal levels.
  2. While the signal polarity is usually fixed on the GPIO consumer
 side (e.g. an LED is tied to either the supply voltage or GND),
 it may be configurable on some devices, and both sides need to
 agree.  Hence the GPIO_ACTIVE_* flag as seen by the consumer must
 match the actual polarity.
 There exists a similar issue with interrupt flags, where both the
 interrupt controller and the device generating the interrupt need
 to agree, which breaks in the presence of a physical inverter not
 described in DT (see e.g. [3]).

[1] "[PATCH V4 2/2] gpio: inverter: document the inverter bindings"

https://lore.kernel.org/linux-gpio/1561699236-18620-3-git-send-email-harish_kand...@mentor.com/

[2] Devicetree Specification v0.3-rc2, Section 2.5

https://github.com/devicetree-org/devicetree-specification/releases/tag/v0.3-rc2

[3] "[PATCH] wlcore/wl18xx: Add invert-irq OF property for physically
inverted IRQ"

https://lore.kernel.org/linux-renesas-soc/20190607172958.20745-1-ero...@de.adit-jv.com/

Signed-off-by: Geert Uytterhoeven 
---
v3:
  - New.
---
 .../bindings/gpio/gpio-repeater.yaml  | 53 +++
 1 file changed, 53 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/gpio/gpio-repeater.yaml

diff --git a/Documentation/devicetree/bindings/gpio/gpio-repeater.yaml 
b/Documentation/devicetree/bindings/gpio/gpio-repeater.yaml
new file mode 100644
index ..efdee0c3be43f731
--- /dev/null
+++ b/Documentation/devicetree/bindings/gpio/gpio-repeater.yaml
@@ -0,0 +1,53 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/gpio/gpio-repeater.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: GPIO Repeater
+
+maintainers:
+  - Harish Jenny K N 
+  - Geert Uytterhoeven 
+
+description:
+  This represents a repeater for one or more GPIOs, possibly including physical
+  signal property translation (e.g. polarity inversion).
+
+properties:
+  compatible:
+const: gpio-repeater
+
+  "#gpio-cells":
+const: 2
+
+  gpio-controller: true
+
+  gpios:
+description:
+  Phandle and specifier, one for each repeated GPIO.
+
+  gpio-line-names:
+description:
+  Strings defining the names of the GPIO lines going out of the GPIO
+  controller.
+
+required:
+  - compatible
+  - "#gpio-cells"
+  - gpio-controller
+  - gpios
+
+additionalProperties: false
+
+examples:
+  # Device node describing a polarity inverter for a single GPIO
+  - |
+#include 
+
+inverter: gpio-repeater {
+compatible = "gpio-repeater";
+#gpio-cells = <2>;
+gpio-controller;
+gpios = <&gpio 95 GPIO_ACTIVE_LOW>;
+};
-- 
2.17.1

[PATCH v3 0/7] gpio: Add GPIO Aggregator/Repeater

2019-11-27 Thread Geert Uytterhoeven

Hi all,

GPIO controllers are exported to userspace using /dev/gpiochip*
character devices.  Access control to these devices is provided by
standard UNIX file system permissions, on an all-or-nothing basis:
either a GPIO controller is accessible for a user, or it is not.
Currently no mechanism exists to control access to individual GPIOs.

Hence this adds a GPIO driver to aggregate existing GPIOs, and expose
them as a new gpiochip.  This is useful for implementing access control,
and assigning a set of GPIOs to a specific user.  Furthermore, this
simplifies and hardens exporting GPIOs to a virtual machine, as the VM
can just grab the full GPIO controller, and no longer needs to care
about which GPIOs to grab and which not, reducing the attack surface.

Recently, other use cases have been discovered[1]:
  - Describing GPIO inverters in DT, as a generic GPIO Repeater,
  - Describing simple GPIO-operated devices in DT, and using the GPIO
Aggregator as a generic GPIO driver for userspace.

Changes compared to v2[2] (more details in the individual patches):
  - Integrate GPIO Repeater functionality,
  - Absorb GPIO forwarder library, as the Aggregator and Repeater are
now a single driver,
  - Use the aggregator parameters to create a GPIO lookup table instead
of an array of GPIO descriptors,
  - Add documentation,
  - New patches:
  - "gpiolib: Add GPIOCHIP_NAME definition",
  - "gpiolib: Add support for gpiochipN-based table lookup",
  - "gpiolib: Add support for GPIO line table lookup",
  - "dt-bindings: gpio: Add gpio-repeater bindings",
  - "docs: gpio: Add GPIO Aggregator/Repeater documentation",
  - "MAINTAINERS: Add GPIO Aggregator/Repeater section".
  - Dropped patches:
  - "gpio: Export gpiod_{request,free}() to modular GPIO code",
  - "gpio: Export gpiochip_get_desc() to modular GPIO code",
  - "gpio: Export gpio_name_to_desc() to modular GPIO code",
  - "gpio: Add GPIO Forwarder Helper".

Changes compared to v1[3]:
  - Drop "virtual", rename to gpio-aggregator,
  - Create and use new GPIO Forwarder Helper, to allow sharing code with
the GPIO inverter,
  - Lift limit on the maximum number of GPIOs,
  - Improve parsing of GPIO specifiers,
  - Fix modular build.

Aggregating GPIOs and exposing them as a new gpiochip was suggested in
response to my proof-of-concept for GPIO virtualization with QEMU[4][5].

For the first use case, aggregated GPIO controllers are instantiated and
destroyed by writing to atribute files in sysfs.
Sample session on the Renesas Koelsch development board:

  - Unbind LEDs from leds-gpio driver:

echo leds > /sys/bus/platform/drivers/leds-gpio/unbind

  - Create aggregators:

$ echo e6052000.gpio 19,20 \
> /sys/bus/platform/drivers/gpio-aggregator/new_device

gpio-aggregator gpio-aggregator.0: gpio 0 => gpio-953 (gpio-aggregator.0)
gpio-aggregator gpio-aggregator.0: gpio 1 => gpio-954 (gpio-aggregator.0)
gpiochip_find_base: found new base at 778
gpio gpiochip8: (gpio-aggregator.0): added GPIO chardev (254:8)
gpiochip_setup_dev: registered GPIOs 778 to 779 on device: gpiochip8 
(gpio-aggregator.0)

$ echo e6052000.gpio 21 e605.gpio 20-22 \
> /sys/bus/platform/drivers/gpio-aggregator/new_device

gpio-aggregator gpio-aggregator.1: gpio 0 => gpio-955 (gpio-aggregator.1)
gpio-aggregator gpio-aggregator.1: gpio 1 => gpio-1012 (gpio-aggregator.1)
gpio-aggregator gpio-aggregator.1: gpio 2 => gpio-1013 (gpio-aggregator.1)
gpio-aggregator gpio-aggregator.1: gpio 3 => gpio-1014 (gpio-aggregator.1)
gpiochip_find_base: found new base at 774
gpio gpiochip9: (gpio-aggregator.1): added GPIO chardev (254:9)
gpiochip_setup_dev: registered GPIOs 774 to 777 on device: gpiochip9 
(gpio-aggregator.1)

  - Adjust permissions on /dev/gpiochip[89] (optional)

  - Control LEDs:

$ gpioset gpiochip8 0=0 1=1 # LED6 OFF, LED7 ON
$ gpioset gpiochip8 0=1 1=0 # LED6 ON, LED7 OFF
$ gpioset gpiochip9 0=0 # LED8 OFF
$ gpioset gpiochip9 0=1 # LED8 ON

  - Destroy aggregators:

$ echo gpio-aggregator.0 \
> /sys/bus/platform/drivers/gpio-aggregator/delete_device
$ echo gpio-aggregator.1 \
> /sys/bus/platform/drivers/gpio-aggregator/delete_device

Thanks for your comments!

References:
  [1] "[PATCH V4 2/2] gpio: inverter: document the inverter bindings"
  
(https://lore.kernel.org/linux-gpio/1561699236-18620-3-git-send-email-harish_kand...@mentor.com/)
  [2] "[PATCH/RFC v2 0/5] gpio: Add GPIO Aggregator Driver"
  
(https://lore.kernel.org/linux-gpio/20190911143858.13024-1-geert+rene...@glider.be/)
  [3] "[PATCH RFC] gpio: Add Virtual Aggregator GPIO Driver"
  
(https://lore.kernel.org/lkml/20190705160536.12047-1-geert+rene...@glider.be/)
  [4] "[PATCH QEMU POC] Add a GPIO backend"
  
(https://lore.kernel.org/linux-renesas-soc/20181003152521.23144-1-geert+rene...@glider.be/)
  [5] "Getting To Blinky:

[PATCH v3 6/7] docs: gpio: Add GPIO Aggregator/Repeater documentation

2019-11-27 Thread Geert Uytterhoeven

Document the GPIO Aggregator/Repeater, and the three typical use-cases.

Signed-off-by: Geert Uytterhoeven 
---
v3:
  - New.
---
 .../admin-guide/gpio/gpio-aggregator.rst  | 111 ++
 Documentation/admin-guide/gpio/index.rst  |   1 +
 2 files changed, 112 insertions(+)
 create mode 100644 Documentation/admin-guide/gpio/gpio-aggregator.rst

diff --git a/Documentation/admin-guide/gpio/gpio-aggregator.rst 
b/Documentation/admin-guide/gpio/gpio-aggregator.rst
new file mode 100644
index ..826146e260253299
--- /dev/null
+++ b/Documentation/admin-guide/gpio/gpio-aggregator.rst
@@ -0,0 +1,111 @@
+.. SPDX-License-Identifier: GPL-2.0-only
+
+GPIO Aggregator/Repeater
+
+
+The GPIO Aggregator/Repeater allows to aggregate GPIOs, and expose them as a
+new gpio_chip.  This supports the following use cases.
+
+
+Aggregating GPIOs using Sysfs
+-
+
+GPIO controllers are exported to userspace using /dev/gpiochip* character
+devices.  Access control to these devices is provided by standard UNIX file
+system permissions, on an all-or-nothing basis: either a GPIO controller is
+accessible for a user, or it is not.
+
+The GPIO Aggregator allows access control for individual GPIOs, by aggregating
+them into a new gpio_chip, which can be assigned to a group or user using
+standard UNIX file ownership and permissions.  Furthermore, this simplifies and
+hardens exporting GPIOs to a virtual machine, as the VM can just grab the full
+GPIO controller, and no longer needs to care about which GPIOs to grab and
+which not, reducing the attack surface.
+
+Aggregated GPIO controllers are instantiated and destroyed by writing to
+write-only attribute files in sysfs.
+
+/sys/bus/platform/drivers/gpio-aggregator/
+
+   "new_device" ...
+   Userspace may ask the kernel to instantiate an aggregated GPIO
+   controller by writing a string describing the GPIOs to
+   aggregate to the "new_device" file, using the format
+
+   .. code-block:: none
+
+   [] [ ] ...
+
+   Where:
+
+   "" ...
+   is a GPIO line name,
+
+   "" ...
+   is a GPIO chip label or name, and
+
+   "" ...
+   is a comma-separated list of GPIO offsets and/or
+   GPIO offset ranges denoted by dashes.
+
+   Example: Instantiate a new GPIO aggregator by aggregating GPIO
+   19 of "e6052000.gpio" and GPIOs 20-21 of "gpiochip2" into a new
+   gpio_chip:
+
+   .. code-block:: bash
+
+   echo 'e6052000.gpio 19 gpiochip2 20-21' > new_device
+
+   "delete_device" ...
+   Userspace may ask the kernel to destroy an aggregated GPIO
+   controller after use by writing its device name to the
+   "delete_device" file.
+
+   Example: Destroy the previously-created aggregated GPIO
+   controller "gpio-aggregator.0":
+
+   .. code-block:: bash
+
+   echo gpio-aggregator.0 > delete_device
+
+
+GPIO Repeater in Device Tree
+
+
+A GPIO Repeater is a node in a Device Tree representing a repeater for one or
+more GPIOs, possibly including physical signal property translation (e.g.
+polarity inversion).  This allows to model e.g. inverters in DT.
+
+See Documentation/devicetree/bindings/gpio/gpio-repeater.yaml
+
+
+Generic GPIO Driver
+---
+
+The GPIO Aggregator can also be used as a generic driver for a simple
+GPIO-operated device described in DT, without a dedicated in-kernel driver.
+This is not unlike e.g. spidev, which allows to communicated with an SPI device
+from userspace.
+
+Binding a device to the GPIO Aggregator is performed either by modifying the
+gpio-aggregator driver, or by writing to the "driver_override" file in Sysfs.
+
+Example: If "frobnicator" is a GPIO-operated device described in DT, using its
+own compatible value::
+
+frobnicator {
+compatible = "myvendor,frobnicator";
+
+gpios = <&gpio2 19 GPIO_ACTIVE_HIGH>,
+<&gpio2 20 GPIO_ACTIVE_LOW>;
+};
+
+it can be bound to the GPIO Aggregator by either:
+
+1. Adding its compatible value to ``gpio_aggregator_dt_ids[]``,
+2. Binding manually using "driver_override":
+
+.. code-block:: bash
+
+echo gpio-aggregator > 
/sys/bus/platform/devices/frobnicator/driver_override
+echo frobnicator > /sys/bus/platform/drivers/gpio-aggregator/bind
diff --git a/Documentation/admin-guide/gpio/index.rst 
b/Documentation/admin-guide/gpio/index.rst
index a244ba4e87d5398a..ef2838638e96 100644
--- a/Documentation/admin-guide/gpio/index.rst
+++ b/Documentation/admin-guide/gpio/index.rst
@@ -7,6 +7,7 @@ gpio
 .. toctree::
 :maxdepth: 1
 
+gpio-aggregator

[PATCH v3 5/7] gpio: Add GPIO Aggregator/Repeater driver

2019-11-27 Thread Geert Uytterhoeven

GPIO controllers are exported to userspace using /dev/gpiochip*
character devices.  Access control to these devices is provided by
standard UNIX file system permissions, on an all-or-nothing basis:
either a GPIO controller is accessible for a user, or it is not.
Currently no mechanism exists to control access to individual GPIOs.

Hence add a GPIO driver to aggregate existing GPIOs, and expose them as
a new gpiochip.

This supports the following use cases:
  1. Aggregating GPIOs using Sysfs
 This is useful for implementing access control, and assigning a set
 of GPIOs to a specific user or virtual machine.

  2. GPIO Repeater in Device Tree
 This supports modelling e.g. GPIO inverters in DT.

  3. Generic GPIO Driver
 This provides userspace access to a simple GPIO-operated device
 described in DT, cfr. e.g. spidev for SPI-operated devices.

Signed-off-by: Geert Uytterhoeven 
---
v3:
  - Absorb GPIO forwarder,
  - Integrate GPIO Repeater and Generic GPIO driver functionality,
  - Use the aggregator parameters to create a GPIO lookup table instead
of an array of GPIO descriptors, which allows to simplify the code:
  1. This removes the need for calling gpio_name_to_desc(),
 gpiochip_find(), gpiochip_get_desc(), and gpiod_request(),
  2. This allows the platform device to always use
 devm_gpiod_get_index(), regardless of the origin of the GPIOs,
  - Move parameter parsing from platform device probe to sysfs attribute
store, removing the need for platform data passing,
  - Use more devm_*() functions to simplify cleanup,
  - Add pr_fmt(),
  - General refactoring.

v2:
  - Add missing initialization of i in gpio_virt_agg_probe(),
  - Update for removed .need_valid_mask field and changed
.init_valid_mask() signature,
  - Drop "virtual", rename to gpio-aggregator,
  - Drop bogus FIXME related to gpiod_set_transitory() expectations,
  - Use new GPIO Forwarder Helper,
  - Lift limit on the maximum number of GPIOs,
  - Improve parsing:
  - add support for specifying GPIOs by line name,
  - add support for specifying GPIO chips by ID,
  - add support for GPIO offset ranges,
  - names and offset specifiers must be separated by whitespace,
  - GPIO offsets must separated by spaces,
  - Use str_has_prefix() and kstrtouint().
---
 drivers/gpio/Kconfig   |  13 +
 drivers/gpio/Makefile  |   1 +
 drivers/gpio/gpio-aggregator.c | 587 +
 3 files changed, 601 insertions(+)
 create mode 100644 drivers/gpio/gpio-aggregator.c

diff --git a/drivers/gpio/Kconfig b/drivers/gpio/Kconfig
index 8adffd42f8cb0559..36b6b57a6b05e906 100644
--- a/drivers/gpio/Kconfig
+++ b/drivers/gpio/Kconfig
@@ -1507,6 +1507,19 @@ config GPIO_VIPERBOARD
 
 endmenu
 
+config GPIO_AGGREGATOR
+   tristate "GPIO Aggregator/Repeater"
+   help
+ Say yes here to enable the GPIO Aggregator and repeater, which
+ provides a way to aggregate and/or repeat existing GPIOs into a new
+ GPIO device.
+ This can serve the following purposes:
+   1. Assign a collection of GPIOs to a user, or export them to a
+  virtual machine,
+   2. Support GPIOs that are connected to a physical inverter,
+   3. Provide a generic driver for a GPIO-operated device, to be
+   controlled from userspace using the GPIO chardev interface.
+
 config GPIO_MOCKUP
tristate "GPIO Testing Driver"
select IRQ_SIM
diff --git a/drivers/gpio/Makefile b/drivers/gpio/Makefile
index 34eb8b2b12dd656c..f9971eeb1f32335f 100644
--- a/drivers/gpio/Makefile
+++ b/drivers/gpio/Makefile
@@ -25,6 +25,7 @@ obj-$(CONFIG_GPIO_74XX_MMIO)  += gpio-74xx-mmio.o
 obj-$(CONFIG_GPIO_ADNP)+= gpio-adnp.o
 obj-$(CONFIG_GPIO_ADP5520) += gpio-adp5520.o
 obj-$(CONFIG_GPIO_ADP5588) += gpio-adp5588.o
+obj-$(CONFIG_GPIO_AGGREGATOR)  += gpio-aggregator.o
 obj-$(CONFIG_GPIO_ALTERA_A10SR)+= gpio-altera-a10sr.o
 obj-$(CONFIG_GPIO_ALTERA)  += gpio-altera.o
 obj-$(CONFIG_GPIO_AMD8111) += gpio-amd8111.o
diff --git a/drivers/gpio/gpio-aggregator.c b/drivers/gpio/gpio-aggregator.c
new file mode 100644
index ..873578c6f9683db8
--- /dev/null
+++ b/drivers/gpio/gpio-aggregator.c
@@ -0,0 +1,587 @@
+// SPDX-License-Identifier: GPL-2.0-only
+//
+// GPIO Aggregator and Repeater
+//
+// Copyright (C) 2019 Glider bvba
+
+#define DRV_NAME   "gpio-aggregator"
+#define pr_fmt(fmt)DRV_NAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "gpiolib.h"
+
+
+   /*
+* GPIO Aggregator sysfs interface
+*/
+
+struct gpio_aggregator {
+   struct gpiod_lookup_table *lookups;
+   struct platform_device *pdev;
+   char args[];
+};
+
+static DEFINE_MUTEX(gpio_aggregator_lo

Re: [PATCH 0/6] Enable Travis builds on arm64, ppc64le and s390x

2019-11-27 Thread Thomas Huth


On 25/11/2019 11.28, Alex Bennée wrote:


Alex Bennée  writes:


Thomas Huth  writes:


Travis recently added build hosts for arm64, ppc64le and s390x, so
this is a welcome addition to our Travis testing matrix.

Unfortunately, the builds are running in quite restricted LXD containers
there, for example it is not possible to create huge files there (even
if they are just sparse), and certain system calls are blocked. So we
have to change some tests first to stop them failing in such environments.



   iotests: Skip test 060 if it is not possible to create large files
   iotests: Skip test 079 if it is not possible to create large files


It seems like 161 is also failing:

   https://travis-ci.org/stsquad/qemu/jobs/615672478


And sometimes 249


These must be intermittent problems ... I've seen 161 failing once at 
the very beginning of my tests, but then never again, so I assumed that 
it was a quirk with the test system that got fixed later. Seems like 
that was a wrong assumption. I've never seen 249 failing so far... I'll 
try to do some more tests when I've got some spare time...


 Thomas

Re: [PATCH v6 16/20] ppc/xive: Introduce a xive_tctx_ipb_update() helper

2019-11-27 Thread Greg Kurz

On Mon, 25 Nov 2019 07:58:16 +0100
Cédric Le Goater  wrote:

> We will use it to resend missed interrupts when a vCPU context is
> pushed on a HW thread.
> 
> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/ppc/xive.h |  1 +
>  hw/intc/xive.c| 21 +++--
>  2 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 24315480e7c2..9c0bf2c301e2 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -469,6 +469,7 @@ void xive_tctx_pic_print_info(XiveTCTX *tctx, Monitor 
> *mon);
>  Object *xive_tctx_create(Object *cpu, XiveRouter *xrtr, Error **errp);
>  void xive_tctx_reset(XiveTCTX *tctx);
>  void xive_tctx_destroy(XiveTCTX *tctx);
> +void xive_tctx_ipb_update(XiveTCTX *tctx, uint8_t ring, uint8_t ipb);
>  
>  /*
>   * KVM XIVE device helpers
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index 4bff3abdc3eb..7047e45daca1 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -47,12 +47,6 @@ static uint8_t ipb_to_pipr(uint8_t ibp)
>  return ibp ? clz32((uint32_t)ibp << 24) : 0xff;
>  }
>  
> -static void ipb_update(uint8_t *regs, uint8_t priority)
> -{
> -regs[TM_IPB] |= priority_to_ipb(priority);
> -regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
> -}
> -
>  static uint8_t exception_mask(uint8_t ring)
>  {
>  switch (ring) {
> @@ -135,6 +129,15 @@ static void xive_tctx_set_cppr(XiveTCTX *tctx, uint8_t 
> ring, uint8_t cppr)
>  xive_tctx_notify(tctx, ring);
>  }
>  
> +void xive_tctx_ipb_update(XiveTCTX *tctx, uint8_t ring, uint8_t ipb)
> +{
> +uint8_t *regs = &tctx->regs[ring];
> +
> +regs[TM_IPB] |= ipb;
> +regs[TM_PIPR] = ipb_to_pipr(regs[TM_IPB]);
> +xive_tctx_notify(tctx, ring);
> +}
> +

Maybe rename the helper to xive_tctx_update_ipb_and_notify() to
make it clear this raises an irq in the end ?

This can be done as follow-up though and the rest looks good, so:

Reviewed-by: Greg Kurz 

>  static inline uint32_t xive_tctx_word2(uint8_t *ring)
>  {
>  return *((uint32_t *) &ring[TM_WORD2]);
> @@ -336,8 +339,7 @@ static void xive_tm_set_os_cppr(XivePresenter *xptr, 
> XiveTCTX *tctx,
>  static void xive_tm_set_os_pending(XivePresenter *xptr, XiveTCTX *tctx,
> hwaddr offset, uint64_t value, unsigned 
> size)
>  {
> -ipb_update(&tctx->regs[TM_QW1_OS], value & 0xff);
> -xive_tctx_notify(tctx, TM_QW1_OS);
> +xive_tctx_ipb_update(tctx, TM_QW1_OS, priority_to_ipb(value & 0xff));
>  }
>  
>  static void xive_os_cam_decode(uint32_t cam, uint8_t *nvt_blk,
> @@ -1429,8 +1431,7 @@ static bool xive_presenter_notify(uint8_t format,
>  
>  /* handle CPU exception delivery */
>  if (count) {
> -ipb_update(&match.tctx->regs[match.ring], priority);
> -xive_tctx_notify(match.tctx, match.ring);
> +xive_tctx_ipb_update(match.tctx, match.ring, 
> priority_to_ipb(priority));
>  }
>  
>  return !!count;

Re: [PATCH v18 0/8] Build ACPI Heterogeneous Memory Attribute Table (HMAT)

2019-11-27 Thread no-reply

Patchew URL: https://patchew.org/QEMU/20191127082613.22903-1-tao3...@intel.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTcheck-qtest-x86_64: tests/numa-test
Broken pipe
/tmp/qemu-test/src/tests/libqtest.c:149: kill_qemu() detected QEMU death from 
signal 8 (Floating point exception) (core dumped)
ERROR - too few tests run (expected 9, got 8)
make: *** [check-qtest-x86_64] Error 1
make: *** Waiting for unfinished jobs
  TESTiotest-qcow2: 159
  TESTiotest-qcow2: 161
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=ab54e713071349928d5255c2915756a6', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-6ggonmqq/src/docker-src.2019-11-27-03.40.08.6487:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=ab54e713071349928d5255c2915756a6
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6ggonmqq/src'
make: *** [docker-run-test-quick@centos7] Error 2

real11m21.984s
user0m8.633s


The full log is available at
http://patchew.org/logs/20191127082613.22903-1-tao3...@intel.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH] target/arm: Honor HCR_EL2.TID3 trapping requirements

2019-11-27 Thread Marc Zyngier


On 2019-11-26 21:04, Richard Henderson wrote:

On 11/23/19 11:56 AM, Marc Zyngier wrote:

HCR_EL2.TID3 mandates that access from EL1 to a long list of id
registers traps to EL2, and QEMU has so far ignored this 
requirement.


This breaks (among other things) KVM guests that have PtrAuth 
enabled,
while the hypervisor doesn't want to expose the feature to its 
guest.
To achieve this, KVM traps the ID registers (ID_AA64ISAR1_EL1 in 
this

case), and masks out the unsupported feature.

QEMU not honoring the trap request means that the guest observes
that the feature is present in the HW, starts using it, and dies
a horrible death when KVM injects an UNDEF, because the feature
*really* isn't supported.

Do the right thing by trapping to EL2 if HCR_EL2.TID3 is set.

Reported-by: Will Deacon 
Signed-off-by: Marc Zyngier 
---
There is a number of other trap bits missing (TID[0-2], for 
example),

but this at least gets a mainline Linux going with cpu=max.


BTW, Peter, this appears to have been the bug that was causing me so 
many

problems on my VHE branch.  Probably *exactly* this bug wrt ptrauth,
since that would also be included with -cpu max.

I am now able to boot a kvm guest kernel to the point of the no 
rootfs panic,

which I wasn't before.

I can only think that I mis-identified the true cause in Lyon.

Anyway, thanks Marc!


Hehe, glad it fixed more than just my pet issue! :-)

M.
--
Jazz is not dead. It just smells funny...

Re: [PULL 0/5] i386 patches for QEMU 4.2-rc

2019-11-27 Thread Dr. David Alan Gilbert

* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:
> On 11/26/19 10:19 AM, no-re...@patchew.org wrote:
> > Patchew URL: 
> > https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/
> > 
> > This series failed the docker-quick@centos7 build test. Please find the 
> > testing commands and
> > their output below. If you have Docker installed, you can probably 
> > reproduce it
> > locally.
> > 
> > === TEST SCRIPT BEGIN ===
> > #!/bin/bash
> > make docker-image-centos7 V=1 NETWORK=1
> > time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> > === TEST SCRIPT END ===
> > 
> >TESTcheck-unit: tests/test-thread-pool
> > wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
> > **
> > ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail:
> >  assertion failed: (result)
> > ERROR - Bail out! 
> > ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail:
> >  assertion failed: (result)
> > make: *** [check-qtest-aarch64] Error 1
> 
> Should we worry about this error?

Interesting; that should be fixed by Jens'
284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending
is still lying;  it's showing we're still landing in 'wait-unplug' on
aarch, because it's got a virtio-net by default; even though we've not
got a failover device setup.  CCing Jens.

Dave

> [...]
> > real9m26.610s
> > user0m8.328s
> > 
> > 
> > The full log is available at
> > http://patchew.org/logs/20191126085936.1689-1-pbonz...@redhat.com/testing.docker-quick@centos7/?type=message.
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PULL 0/5] i386 patches for QEMU 4.2-rc

2019-11-27 Thread Jens Freimann


On Wed, Nov 27, 2019 at 09:14:01AM +, Dr. David Alan Gilbert wrote:

* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:

On 11/26/19 10:19 AM, no-re...@patchew.org wrote:
> Patchew URL: 
https://patchew.org/QEMU/20191126085936.1689-1-pbonz...@redhat.com/
>
> This series failed the docker-quick@centos7 build test. Please find the 
testing commands and
> their output below. If you have Docker installed, you can probably reproduce 
it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
>
>TESTcheck-unit: tests/test-thread-pool
> wait_for_migration_fail: unexpected status status=wait-unplug allow_active=1
> **
> ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> ERROR - Bail out! 
ERROR:/tmp/qemu-test/src/tests/migration-test.c:908:wait_for_migration_fail: 
assertion failed: (result)
> make: *** [check-qtest-aarch64] Error 1

Should we worry about this error?


Interesting; that should be fixed by Jens'
284f42a520cd9f5905abac2fa50397423890de8f - unless fix dev_unplug_pending
is still lying;  it's showing we're still landing in 'wait-unplug' on
aarch, because it's got a virtio-net by default; even though we've not
got a failover device setup.  CCing Jens.


Hmm, I did test it. I'm looking into it.

regards
Jens

Re: [PATCH] usbredir: remove 'remote wake' capability from configuration descriptor

2019-11-27 Thread Markus Armbruster

Yuri Benditovich  writes:

> On Wed, Nov 27, 2019 at 8:36 AM Markus Armbruster  wrote:
>>
>> Yuri Benditovich  writes:
>>
>> > If the redirected device has this capability, Windows guest may
>> > place the device into D2 and expect it to wake when the device
>> > becomes active, but this will never happen. For example, when
>> > internal Bluetooth adapter is redirected, keyboards and mice
>> > connected to it do not work. Setting global property
>> > 'usb-redir.nowake=off' keeps 'remote wake' as is.
>>
>> "usb-redir.nowake=off" is a double negation.  Gets weirder when dusted
>> with syntactic sugar: "usb-redir.nonowake".  Can we think of a better
>> name?  Naming is hard...  What about "usb-redir.wakeup=on"?
> '"wakeup" is good but "wakeup=on" makes an impression that we add the
> capability to the device even if it does not have one.

True.

> disable_wake? suppress_wake? clear_wake? wake_allowed?

Let's have a look at what the property does:

>> > Signed-off-by: Yuri Benditovich 
>> > ---
>> >  hw/usb/redirect.c | 19 +++
>> >  1 file changed, 19 insertions(+)
>> >
>> > diff --git a/hw/usb/redirect.c b/hw/usb/redirect.c
>> > index e0f5ca6f81..e95898fe80 100644
>> > --- a/hw/usb/redirect.c
>> > +++ b/hw/usb/redirect.c
>> > @@ -113,6 +113,7 @@ struct USBRedirDevice {
>> >  /* Properties */
>> >  CharBackend cs;
>> >  bool enable_streams;
>> > +bool suppress_remote_wake;
>> >  uint8_t debug;
>> >  int32_t bootindex;
>> >  char *filter_str;
>> > @@ -1989,6 +1990,23 @@ static void usbredir_control_packet(void *priv, 
>> > uint64_t id,
>> >  memcpy(dev->dev.data_buf, data, data_len);
>> >  }
>> >  p->actual_length = len;
>> > +/*
>> > + * If this is GET_DESCRIPTOR request for configuration descriptor,
>> > + * remove 'remote wakeup' flag from it to prevent idle power down
>> > + * in Windows guest
>> > + */
>> > +if (dev->suppress_remote_wake &&
>> > +control_packet->requesttype == USB_DIR_IN &&
>> > +control_packet->request == USB_REQ_GET_DESCRIPTOR &&
>> > +control_packet->value == (USB_DT_CONFIG << 8) &&
>> > +control_packet->index == 0 &&
>> > +/* bmAttributes field of config descriptor */
>> > +len > 7 && (dev->dev.data_buf[7] & USB_CFG_ATT_WAKEUP)) {
>> > +DPRINTF("Removed remote wake %04X:%04X\n",
>> > +dev->device_info.vendor_id,
>> > +dev->device_info.product_id);
>> > +dev->dev.data_buf[7] &= ~USB_CFG_ATT_WAKEUP;
>> > +}

If the property is true, and this is a GET_DESCRIPTOR control packet
with USB_CFG_ATT_WAKEUP bit set, unset it.  Correct?

Assuming it is: "suppress_wakup" feels okay to me.

Whatever we pick, I recommend naming the USBRedirDevice member like the
property.  It's currently named @suppress_remote_wake.

>> >  usb_generic_async_ctrl_complete(&dev->dev, p);
>> >  }
>> >  free(data);
>> > @@ -2530,6 +2548,7 @@ static Property usbredir_properties[] = {
>> >  DEFINE_PROP_UINT8("debug", USBRedirDevice, debug, 
>> > usbredirparser_warning),
>> >  DEFINE_PROP_STRING("filter", USBRedirDevice, filter_str),
>> >  DEFINE_PROP_BOOL("streams", USBRedirDevice, enable_streams, true),
>> > +DEFINE_PROP_BOOL("nowake", USBRedirDevice, suppress_remote_wake, 
>> > true),
>> >  DEFINE_PROP_END_OF_LIST(),
>> >  };
>>
>> The default is .nowake=on.  Is that a guest-visible change?  Do we need
>> compat properties to keep it off for existing machine types?
>
> Guest will see the device as one without 'remote wake' capability.
> IMO, in the worst case this does not change anything, in the best case
> this will suppress device power transition to D2 and the device will
> work.
> Including existing machine types.
> Probably I did not understand the idea of 'compat property', can you
> please provide an example of some existing compat property?
> And, of course, we can keep existing behavior by default and advise to
> turn this property on to make these devices work.

Guest-visible changes require care.  Consider:

* Live migration

  This is meant to be transparent to the guest, even when we migrate to
  a different version of QEMU.  Guest-visible hardware changes are
  no-no.

* Cold reboot ("dead" migration)

  Guests should cope with hardware changes on cold reboot.
  Nevertheless, users do not appreciate surprise changes, so we better
  control them.  Also, the Windows reactivation spectre lurks.

Our general rule is to keep the guest ABI stable for released machine
types, and change it only in the latest, not-yet-released machine type.

To achieve this, we guard the change by a device property, which
defaults to the new behavior (your patch does that already).  We use
compat properties to flip the default to old behavior for released
machine types.

We occasionally make exceptions for sufficiently harml

Re: [PATCH] usbredir: remove 'remote wake' capability from configuration descriptor

2019-11-27 Thread Gerd Hoffmann

On Wed, Nov 27, 2019 at 09:36:21AM +0200, Yuri Benditovich wrote:
> On Wed, Nov 27, 2019 at 8:36 AM Markus Armbruster  wrote:
> >
> > Yuri Benditovich  writes:
> >
> > > If the redirected device has this capability, Windows guest may
> > > place the device into D2 and expect it to wake when the device
> > > becomes active, but this will never happen. For example, when
> > > internal Bluetooth adapter is redirected, keyboards and mice
> > > connected to it do not work. Setting global property
> > > 'usb-redir.nowake=off' keeps 'remote wake' as is.
> >
> > "usb-redir.nowake=off" is a double negation.  Gets weirder when dusted
> > with syntactic sugar: "usb-redir.nonowake".  Can we think of a better
> > name?  Naming is hard...  What about "usb-redir.wakeup=on"?
> '"wakeup" is good but "wakeup=on" makes an impression that we add the
> capability to the device even if it does not have one.
> disable_wake? suppress_wake? clear_wake? wake_allowed?

remote-wakeup=on,off ?

> > > +DEFINE_PROP_BOOL("nowake", USBRedirDevice, suppress_remote_wake, 
> > > true),
> > >  DEFINE_PROP_END_OF_LIST(),
> > >  };
> >
> > The default is .nowake=on.  Is that a guest-visible change?

Yes, usb descriptors change, which the guest can see.

> And, of course, we can keep existing behavior by default and advise to
> turn this property on to make these devices work.

In that case a compat property would not be needed.

But, after all the question is whenever that is the best way to solve
the problem.  Most likely there is just a usb_wakeup() call missing
somewhere ...

cheers,
  Gerd

Re: [PATCH v18 1/8] numa: Extend CLI to provide initiator information for numa nodes

2019-11-27 Thread Markus Armbruster

Tao Xu  writes:

> In ACPI 6.3 chapter 5.2.27 Heterogeneous Memory Attribute Table (HMAT),
> The initiator represents processor which access to memory. And in 5.2.27.3
> Memory Proximity Domain Attributes Structure, the attached initiator is
> defined as where the memory controller responsible for a memory proximity
> domain. With attached initiator information, the topology of heterogeneous
> memory can be described.
>
> Extend CLI of "-numa node" option to indicate the initiator numa node-id.
> In the linux kernel, the codes in drivers/acpi/hmat/hmat.c parse and report
> the platform's HMAT tables.

Please mention new machine property "hmat".

> Reviewed-by: Igor Mammedov 
> Reviewed-by: Jingqi Liu 
> Suggested-by: Dan Williams 
> Signed-off-by: Tao Xu 

QAPI part looks good to me.

Re: [PATCH v18 2/8] numa: Extend CLI to provide memory latency and bandwidth information

2019-11-27 Thread Markus Armbruster

Tao Xu  writes:

> From: Liu Jingqi 
>
> Add -numa hmat-lb option to provide System Locality Latency and
> Bandwidth Information. These memory attributes help to build
> System Locality Latency and Bandwidth Information Structure(s)
> in ACPI Heterogeneous Memory Attribute Table (HMAT).

Please mention this requires -machine hmat=on.

> Signed-off-by: Liu Jingqi 
> Signed-off-by: Tao Xu 
[...]
> diff --git a/qapi/machine.json b/qapi/machine.json
> index 27d0e37534..c741649d7b 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -426,10 +426,12 @@
>  #
>  # @cpu: property based CPU(s) to node mapping (Since: 2.10)
>  #
> +# @hmat-lb: memory latency and bandwidth information (Since: 5.0)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -444,7 +446,8 @@
>'data': {
>  'node': 'NumaNodeOptions',
>  'dist': 'NumaDistOptions',
> -'cpu': 'NumaCpuOptions' }}
> +'cpu': 'NumaCpuOptions',
> +'hmat-lb': 'NumaHmatLBOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -557,6 +560,93 @@
> 'base': 'CpuInstanceProperties',
> 'data' : {} }
>  
> +##
> +# @HmatLBMemoryHierarchy:
> +#
> +# The memory hierarchy in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information about @HmatLBMemoryHierarchy see
> +# the chapter 5.2.27.4: Table 5-146: Field "Flags" of ACPI 6.3 spec.

Comma before "see", and s/the chapter/chapter/.

Suggest to fill these paragraphs more evenly:

   # The memory hierarchy in the System Locality Latency and Bandwidth
   # Information Structure of HMAT (Heterogeneous Memory Attribute Table)
   #
   # For more information about @HmatLBMemoryHierarchy, see chapter
   # 5.2.27.4: Table 5-146: Field "Flags" of ACPI 6.3 spec.

> +#
> +# @memory: the structure represents the memory performance
> +#
> +# @first-level: first level of memory side cache
> +#
> +# @second-level: second level of memory side cache
> +#
> +# @third-level: third level of memory side cache
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatLBMemoryHierarchy',
> +  'data': [ 'memory', 'first-level', 'second-level', 'third-level' ] }
> +
> +##
> +# @HmatLBDataType:
> +#
> +# Data type in the System Locality Latency
> +# and Bandwidth Information Structure of HMAT (Heterogeneous
> +# Memory Attribute Table)
> +#
> +# For more information about @HmatLBDataType see
> +# the chapter 5.2.27.4: Table 5-146:  Field "Data Type" of ACPI 6.3 spec.

Likewise.

> +#
> +# @access-latency: access latency (nanoseconds)
> +#
> +# @read-latency: read latency (nanoseconds)
> +#
> +# @write-latency: write latency (nanoseconds)
> +#
> +# @access-bandwidth: access bandwidth (B/s)

We spell out bytes per second elsewhere in the QAPI schema.  Let's to
the same here.

> +#
> +# @read-bandwidth: read bandwidth (B/s)
> +#
> +# @write-bandwidth: write bandwidth (B/s)
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatLBDataType',
> +  'data': [ 'access-latency', 'read-latency', 'write-latency',
> +'access-bandwidth', 'read-bandwidth', 'write-bandwidth' ] }
> +
> +##
> +# @NumaHmatLBOptions:
> +#
> +# Set the system locality latency and bandwidth information
> +# between Initiator and Target proximity Domains.
> +#
> +# For more information about @NumaHmatLBOptions see
> +# the chapter 5.2.27.4: Table 5-146 of ACPI 6.3 spec.

Likewise.

> +#
> +# @initiator: the Initiator Proximity Domain.
> +#
> +# @target: the Target Proximity Domain.
> +#
> +# @hierarchy: the Memory Hierarchy. Indicates the performance
> +# of memory or side cache.
> +#
> +# @data-type: presents the type of data, access/read/write
> +# latency or hit latency.
> +#
> +# @latency: the value of latency from @initiator to @target proximity domain,
> +#   the latency unit is "ns(nanosecond)".
> +#
> +# @bandwidth: the value of bandwidth between @initiator and @target proximity
> +# domain, the bandwidth unit is "B(/s)".

Break lines around column 70, please.

> +#
> +# Since: 5.0
> +##
> +{ 'struct': 'NumaHmatLBOptions',
> +'data': {
> +'initiator': 'uint16',
> +'target': 'uint16',
> +'hierarchy': 'HmatLBMemoryHierarchy',
> +'data-type': 'HmatLBDataType',
> +'*latency': 'uint64',
> +'*bandwidth': 'size' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
[...]

[RFC 09/10] Clean up Radeon Header files

2019-11-27 Thread Aleksandar Markovic

On Tuesday, November 26, 2019,  wrote:

> From: Aaron Dominick 
>
> ---


Your commit message is poor. You should have clearly explained what do you
do in this cleanup, and why.

Aleksandar


>  hw/display/atom-bits.h   |   48 -
>  hw/display/atom-names.h  |  100 -
>  hw/display/atom-types.h  |   42 -
>  hw/display/atom.h|  160 -
>  hw/display/atombios.h| 7981 --
>  hw/display/avivod.h  |   62 -
>  hw/display/cayman_blit_shaders.h |   35 -
>  hw/display/ci_dpm.h  |  341 --
>  hw/display/cik_blit_shaders.h|   32 -
>  hw/display/cikd.h| 2172 
>  hw/display/r300d.h   |  343 --
>  hw/display/r420d.h   |  249 -
>  hw/display/r520d.h   |  187 -
>  hw/display/r600_blit_shaders.h   |   38 -
>  hw/display/r600_dpm.h|  238 -
>  hw/display/r600d.h   | 2370 -
>  hw/display/radeon.h  | 2967 ---
>  hw/display/radeon_acpi.h |  456 --
>  hw/display/radeon_asic.h |  986 
>  hw/display/radeon_drv.h  |  121 -
>  hw/display/radeon_family.h   |  122 -
>  hw/display/radeon_mode.h | 1002 
>  hw/display/radeon_object.h   |  197 -
>  hw/display/radeon_trace.h|  209 -
>  hw/display/radeon_ucode.h|  227 -
>  hw/display/rs100d.h  |   40 -
>  hw/display/rs400d.h  |  160 -
>  hw/display/rs600d.h  |  685 ---
>  hw/display/rs690d.h  |  313 --
>  hw/display/rs780_dpm.h   |  109 -
>  hw/display/rs780d.h  |  171 -
>  hw/display/rv200d.h  |   36 -
>  hw/display/rv250d.h  |  123 -
>  hw/display/rv350d.h  |   52 -
>  hw/display/rv515d.h  |  638 ---
>  hw/display/rv6xx_dpm.h   |   95 -
>  hw/display/rv6xxd.h  |  246 -
>  hw/display/rv730d.h  |  165 -
>  hw/display/rv740d.h  |  117 -
>  hw/display/rv770_dpm.h   |  285 --
>  hw/display/rv770_smc.h   |  207 -
>  hw/display/rv770d.h  | 1015 
>  hw/display/si_blit_shaders.h |   32 -
>  hw/display/si_dpm.h  |  238 -
>  hw/display/sid.h | 1956 
>  hw/display/sislands_smc.h|  424 --
>  hw/display/smu7.h|  170 -
>  hw/display/smu7_discrete.h   |  514 --
>  hw/display/smu7_fusion.h |  300 --
>  hw/display/sumo_dpm.h|  221 -
>  hw/display/sumod.h   |  372 --
>  hw/display/trinity_dpm.h |  134 -
>  hw/display/trinityd.h|  228 -
>  53 files changed, 29731 deletions(-)
>  delete mode 100644 hw/display/atom-bits.h
>  delete mode 100644 hw/display/atom-names.h
>  delete mode 100644 hw/display/atom-types.h
>  delete mode 100644 hw/display/atom.h
>  delete mode 100644 hw/display/atombios.h
>  delete mode 100644 hw/display/avivod.h
>  delete mode 100644 hw/display/cayman_blit_shaders.h
>  delete mode 100644 hw/display/ci_dpm.h
>  delete mode 100644 hw/display/cik_blit_shaders.h
>  delete mode 100644 hw/display/cikd.h
>  delete mode 100644 hw/display/r300d.h
>  delete mode 100644 hw/display/r420d.h
>  delete mode 100644 hw/display/r520d.h
>  delete mode 100644 hw/display/r600_blit_shaders.h
>  delete mode 100644 hw/display/r600_dpm.h
>  delete mode 100644 hw/display/r600d.h
>  delete mode 100644 hw/display/radeon.h
>  delete mode 100644 hw/display/radeon_acpi.h
>  delete mode 100644 hw/display/radeon_asic.h
>  delete mode 100644 hw/display/radeon_drv.h
>  delete mode 100644 hw/display/radeon_family.h
>  delete mode 100644 hw/display/radeon_mode.h
>  delete mode 100644 hw/display/radeon_object.h
>  delete mode 100644 hw/display/radeon_trace.h
>  delete mode 100644 hw/display/radeon_ucode.h
>  delete mode 100644 hw/display/rs100d.h
>  delete mode 100644 hw/display/rs400d.h
>  delete mode 100644 hw/display/rs600d.h
>  delete mode 100644 hw/display/rs690d.h
>  delete mode 100644 hw/display/rs780_dpm.h
>  delete mode 100644 hw/display/rs780d.h
>  delete mode 100644 hw/display/rv200d.h
>  delete mode 100644 hw/display/rv250d.h
>  delete mode 100644 hw/display/rv350d.h
>  delete mode 100644 hw/display/rv515d.h
>  delete mode 100644 hw/display/rv6xx_dpm.h
>  delete mode 100644 hw/display/rv6xxd.h
>  delete mode 100644 hw/display/rv730d.h
>  delete mode 100644 hw/display/rv740d.h
>  delete mode 100644 hw/display/rv770_dpm.h
>  delete mode 100644 hw/display/rv770_smc.h
>  delete mode 100644 hw/display/rv770d.h
>  delete mode 100644 hw/display/si_blit_shaders.h
>  delete mode 100644 hw/display/si_dpm.h
>  delete mode 100644 hw/display/sid.h
>  delete mode 100644 hw/display/sislands_smc.h
>  delete mode 100644 hw/display/smu7.h
>  delete mode 100644 hw/display/smu7_discrete.h
>  delete mode 100644 hw/display/smu7_fusion.h
>  delete mode 100644 hw/display/sumo_dpm.h
>  delete mode 100644 hw/display/sumod.h
>  delete mode 100644 hw/display/tri

Re: [PATCH v18 3/8] numa: Extend CLI to provide memory side cache information

2019-11-27 Thread Markus Armbruster

Tao Xu  writes:

> From: Liu Jingqi 
>
> Add -numa hmat-cache option to provide Memory Side Cache Information.
> These memory attributes help to build Memory Side Cache Information
> Structure(s) in ACPI Heterogeneous Memory Attribute Table (HMAT).

Please mention this requires -machine hmat=on.

> Reviewed-by: Igor Mammedov 
> Reviewed-by: Daniel Black 
> Signed-off-by: Liu Jingqi 
> Signed-off-by: Tao Xu 
[...]
> diff --git a/qapi/machine.json b/qapi/machine.json
> index c741649d7b..3d0ba226a9 100644
> --- a/qapi/machine.json
> +++ b/qapi/machine.json
> @@ -428,10 +428,12 @@
>  #
>  # @hmat-lb: memory latency and bandwidth information (Since: 5.0)
>  #
> +# @hmat-cache: memory side cache information (Since: 5.0)
> +#
>  # Since: 2.1
>  ##
>  { 'enum': 'NumaOptionsType',
> -  'data': [ 'node', 'dist', 'cpu', 'hmat-lb' ] }
> +  'data': [ 'node', 'dist', 'cpu', 'hmat-lb', 'hmat-cache' ] }
>  
>  ##
>  # @NumaOptions:
> @@ -447,7 +449,8 @@
>  'node': 'NumaNodeOptions',
>  'dist': 'NumaDistOptions',
>  'cpu': 'NumaCpuOptions',
> -'hmat-lb': 'NumaHmatLBOptions' }}
> +'hmat-lb': 'NumaHmatLBOptions',
> +'hmat-cache': 'NumaHmatCacheOptions' }}
>  
>  ##
>  # @NumaNodeOptions:
> @@ -647,6 +650,77 @@
>  '*latency': 'uint64',
>  '*bandwidth': 'size' }}
>  
> +##
> +# @HmatCacheAssociativity:
> +#
> +# Cache associativity in the Memory Side Cache
> +# Information Structure of HMAT
> +#
> +# For more information of @HmatCacheAssociativity see
> +# the chapter 5.2.27.5: Table 5-147 of ACPI 6.3 spec.

  # Cache associativity in the Memory Side Cache Information Structure
  # of HMAT
  #
  # For more information of @HmatCacheAssociativity, see chapter
  # 5.2.27.5: Table 5-147 of ACPI 6.3 spec.

> +#
> +# @none: None

What does cache associativity @none mean?  A none-associative cache?  I
guess it makes sense to people familiar with the ACPI spec...

> +#
> +# @direct: Direct Mapped
> +#
> +# @complex: Complex Cache Indexing (implementation specific)
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatCacheAssociativity',
> +  'data': [ 'none', 'direct', 'complex' ] }
> +
> +##
> +# @HmatCacheWritePolicy:
> +#
> +# Cache write policy in the Memory Side Cache
> +# Information Structure of HMAT
> +#
> +# For more information of @HmatCacheWritePolicy see
> +# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 
> spec.

Break lines around column 70, please.

> +#
> +# @none: None

What does cache write policy @none mean?

> +#
> +# @write-back: Write Back (WB)
> +#
> +# @write-through: Write Through (WT)
> +#
> +# Since: 5.0
> +##
> +{ 'enum': 'HmatCacheWritePolicy',
> +  'data': [ 'none', 'write-back', 'write-through' ] }
> +
> +##
> +# @NumaHmatCacheOptions:
> +#
> +# Set the memory side cache information for a given memory domain.
> +#
> +# For more information of @NumaHmatCacheOptions see
> +# the chapter 5.2.27.5: Table 5-147: Field "Cache Attributes" of ACPI 6.3 
> spec.
> +#
> +# @node-id: the memory proximity domain to which the memory belongs.
> +#
> +# @size: the size of memory side cache in bytes.
> +#
> +# @level: the cache level described in this structure.
> +#
> +# @assoc: the cache associativity, none/direct-mapped/complex(complex cache 
> indexing).
> +#
> +# @policy: the write policy, none/write-back/write-through.
> +#
> +# @line: the cache Line size in bytes.
> +#
> +# Since: 5.0
> +##
> +{ 'struct': 'NumaHmatCacheOptions',
> +  'data': {
> +   'node-id': 'uint32',

Ignorant question: you use 'uint16' for other "proximity domains".  Is
'uint32' intentional here?

> +   'size': 'size',
> +   'level': 'uint8',
> +   'assoc': 'HmatCacheAssociativity',
> +   'policy': 'HmatCacheWritePolicy',
> +   'line': 'uint16' }}
> +
>  ##
>  # @HostMemPolicy:
>  #
[...]

[PATCH 0/2] analyze-migration.py: require python 3

2019-11-27 Thread Marc-André Lureau

Hi,

The following 2 patches fix some error and deprecation warnings with
python 3. It drops usage of numpy and python 2 support.

Marc-André Lureau (2):
  analyze-migration.py: fix find() type error
  analyze-migration.py: replace numpy with python 3.2

 scripts/analyze-migration.py | 39 +++-
 1 file changed, 21 insertions(+), 18 deletions(-)

-- 
2.24.0

[PATCH 2/2] analyze-migration.py: replace numpy with python 3.2

2019-11-27 Thread Marc-André Lureau

Use int.from_bytes() from python 3.2 instead.

Signed-off-by: Marc-André Lureau 
---
 scripts/analyze-migration.py | 35 +++
 1 file changed, 19 insertions(+), 16 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index 2b835d9b70..96a31d3974 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -1,4 +1,4 @@
-#!/usr/bin/env python
+#!/usr/bin/env python3
 #
 #  Migration Stream Analyzer
 #
@@ -17,12 +17,18 @@
 # You should have received a copy of the GNU Lesser General Public
 # License along with this library; if not, see .
 
-from __future__ import print_function
-import numpy as np
 import json
 import os
 import argparse
 import collections
+import struct
+import sys
+
+
+MIN_PYTHON = (3, 2)
+if sys.version_info < MIN_PYTHON:
+sys.exit("Python %s.%s or later is required.\n" % MIN_PYTHON)
+
 
 def mkdir_p(path):
 try:
@@ -30,29 +36,26 @@ def mkdir_p(path):
 except OSError:
 pass
 
+
 class MigrationFile(object):
 def __init__(self, filename):
 self.filename = filename
 self.file = open(self.filename, "rb")
 
 def read64(self):
-return np.asscalar(np.fromfile(self.file, count=1, dtype='>i8')[0])
+return int.from_bytes(self.file.read(8), byteorder='big', signed=True)
 
 def read32(self):
-return np.asscalar(np.fromfile(self.file, count=1, dtype='>i4')[0])
+return int.from_bytes(self.file.read(4), byteorder='big', signed=True)
 
 def read16(self):
-return np.asscalar(np.fromfile(self.file, count=1, dtype='>i2')[0])
+return int.from_bytes(self.file.read(2), byteorder='big', signed=True)
 
 def read8(self):
-return np.asscalar(np.fromfile(self.file, count=1, dtype='>i1')[0])
+return int.from_bytes(self.file.read(1), byteorder='big', signed=True)
 
 def readstr(self, len = None):
-if len is None:
-len = self.read8()
-if len == 0:
-return ""
-return np.fromfile(self.file, count=1, dtype=('S%d' % len))[0]
+return self.readvar(len).decode('utf-8')
 
 def readvar(self, size = None):
 if size is None:
@@ -275,7 +278,7 @@ class VMSDFieldGeneric(object):
 return str(self.__str__())
 
 def __str__(self):
-return " ".join("{0:02x}".format(ord(c)) for c in self.data)
+return " ".join("{0:02x}".format(c) for c in self.data)
 
 def getDict(self):
 return self.__str__()
@@ -307,8 +310,8 @@ class VMSDFieldInt(VMSDFieldGeneric):
 
 def read(self):
 super(VMSDFieldInt, self).read()
-self.sdata = np.fromstring(self.data, count=1, dtype=(self.sdtype))[0]
-self.udata = np.fromstring(self.data, count=1, dtype=(self.udtype))[0]
+self.sdata = int.from_bytes(self.data, byteorder='big', signed=True)
+self.udata = int.from_bytes(self.data, byteorder='big', signed=False)
 self.data = self.sdata
 return self.data
 
@@ -363,7 +366,7 @@ class VMSDFieldStruct(VMSDFieldGeneric):
 array_len = field.pop('array_len')
 field['index'] = 0
 new_fields.append(field)
-for i in xrange(1, array_len):
+for i in range(1, array_len):
 c = field.copy()
 c['index'] = i
 new_fields.append(c)
-- 
2.24.0

[PATCH 1/2] analyze-migration.py: fix find() type error

2019-11-27 Thread Marc-André Lureau

Traceback (most recent call last):
  File "../scripts/analyze-migration.py", line 611, in 
dump.read(desc_only = True)
  File "../scripts/analyze-migration.py", line 513, in read
self.load_vmsd_json(file)
  File "../scripts/analyze-migration.py", line 556, in load_vmsd_json
vmsd_json = file.read_migration_debug_json()
  File "../scripts/analyze-migration.py", line 89, in read_migration_debug_json
nulpos = data.rfind("\0")
TypeError: argument should be integer or bytes-like object, not 'str'

Signed-off-by: Marc-André Lureau 
---
 scripts/analyze-migration.py | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/scripts/analyze-migration.py b/scripts/analyze-migration.py
index e527eb168e..2b835d9b70 100755
--- a/scripts/analyze-migration.py
+++ b/scripts/analyze-migration.py
@@ -86,8 +86,8 @@ class MigrationFile(object):
 
 # Find the last NULL byte, then the first brace after that. This should
 # be the beginning of our JSON data.
-nulpos = data.rfind("\0")
-jsonpos = data.find("{", nulpos)
+nulpos = data.rfind(b'\0')
+jsonpos = data.find(b'{', nulpos)
 
 # Check backwards from there and see whether we guessed right
 self.file.seek(datapos + jsonpos - 5, 0)
-- 
2.24.0

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Igor Mammedov

On Wed, 27 Nov 2019 09:40:14 +0800
Xiang Zheng  wrote:

> Hi,
> 
> On 2019/11/16 0:37, Igor Mammedov wrote:
> > On Mon, 11 Nov 2019 09:40:47 +0800
> > Xiang Zheng  wrote:
> >   
> >> From: Dongjiu Geng 
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng 
> >> Signed-off-by: Xiang Zheng 
> >> Reviewed-by: Michael S. Tsirkin 
> >> ---
> >>  hw/acpi/acpi_ghes.c | 297 
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h|   3 +-
> >>  target/arm/cpu.h|   4 +
> >>  target/arm/helper.c |   2 +-
> >>  target/arm/internals.h  |   5 +-
> >>  target/arm/kvm64.c  |  64 
> >>  target/arm/tlb_helper.c |   2 +-
> >>  target/i386/cpu.h   |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
> >>  
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH   72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */  
> > maybe use one line comment
> >   
> >> +#define ACPI_GHES_MEM_CPER_LENGTH   80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */  
> > ditto
> >   
> >> +#define ACPI_GEBS_UNCORRECTABLE 1
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */  
> > ditto
> >   
> 
> OK, I will use one line comment.
> 
> >> +enum AcpiGenericErrorSeverity {
> >> +ACPI_CPER_SEV_RECOVERABLE,
> >> +ACPI_CPER_SEV_FATAL,
> >> +ACPI_CPER_SEV_CORRECTED,
> >> +ACPI_CPER_SEV_NONE,  
> > I'd assign values explicitly here
> >   foo = x,
> >   ...  
> 
> OK.
> 
> >   
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>  
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
> >> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
> >> 0xff, \
> >> +((b) >> 8) & 0xff, (b) & 0xff,   \
> >> +((c) >> 8) & 0xff, (c) & 0xff,\
> >> +(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM   \
> >> +UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*
> >>   * | +--+ 0
> >>   * | |Header|
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>  uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>  
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE 20  
> >   
> >> +/* The offset of Data Length in Generic Error Status Block */
> >> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12  
> > 
> > unused, drop it
> >   
> 
> OK.
> 
> >> +
> >> +/*
> >> + * Record the value of data length for each error status block to avoid 
> >> getting
> >> + * this value from guest.
> >> + */
> >> +static uint32_t acpi_ghes_data_length[ACPI_GHES_ERROR_SOURCE_COUNT];
> >> +
> >> +/*
> >>

Re: [PATCH] usbredir: remove 'remote wake' capability from configuration descriptor

2019-11-27 Thread Yuri Benditovich

On Wed, Nov 27, 2019 at 11:40 AM Gerd Hoffmann  wrote:
>
> On Wed, Nov 27, 2019 at 09:36:21AM +0200, Yuri Benditovich wrote:
> > On Wed, Nov 27, 2019 at 8:36 AM Markus Armbruster  wrote:
> > >
> > > Yuri Benditovich  writes:
> > >
> > > > If the redirected device has this capability, Windows guest may
> > > > place the device into D2 and expect it to wake when the device
> > > > becomes active, but this will never happen. For example, when
> > > > internal Bluetooth adapter is redirected, keyboards and mice
> > > > connected to it do not work. Setting global property
> > > > 'usb-redir.nowake=off' keeps 'remote wake' as is.
> > >
> > > "usb-redir.nowake=off" is a double negation.  Gets weirder when dusted
> > > with syntactic sugar: "usb-redir.nonowake".  Can we think of a better
> > > name?  Naming is hard...  What about "usb-redir.wakeup=on"?
> > '"wakeup" is good but "wakeup=on" makes an impression that we add the
> > capability to the device even if it does not have one.
> > disable_wake? suppress_wake? clear_wake? wake_allowed?
>
> remote-wakeup=on,off ?

This is like wakeup=on, suggesting that we turn wake on even if it is
not supported.
Anyway, I agree with any name.

>
> > > > +DEFINE_PROP_BOOL("nowake", USBRedirDevice, suppress_remote_wake, 
> > > > true),
> > > >  DEFINE_PROP_END_OF_LIST(),
> > > >  };
> > >
> > > The default is .nowake=on.  Is that a guest-visible change?
>
> Yes, usb descriptors change, which the guest can see.
>
> > And, of course, we can keep existing behavior by default and advise to
> > turn this property on to make these devices work.
>
> In that case a compat property would not be needed.
>
> But, after all the question is whenever that is the best way to solve
> the problem.  Most likely there is just a usb_wakeup() call missing
> somewhere ...
>

Indeed, it would be good to call usb_wakeup(), but ... there is no
trigger to do that.

When the guest places the device to D2, it cancels all the urbs that
were pending, so there are no request that will be completed on spice
client side that can call usb_wakeup on qemu side.
The device on spice client side is powered up without any active
request so the device will not produce wake up event.
Usb-redir protocol, and client-side libusb and its kernel partner of
libusb (in case of Windows client - UsbDk/winusb) can't process such
flow correctly.
Of course, AFAIK.

Similar problem happens with local redirection, BTW. But this is for
another patch.

> cheers,
>   Gerd
>

Re: Network connection with COLO VM

2019-11-27 Thread Dr. David Alan Gilbert

* Daniel Cho (daniel...@qnap.com) wrote:
> Hello everyone,
> 
> Could we ssh to colo VM (means PVM & SVM are starting)?
> 

Lets cc in Zhang Chen and Lukas Straub.

> SSH will connect to colo VM for a while, but it will disconnect with error
> *client_loop: send disconnect: Broken pipe*
> 
> It seems to colo VM could not keep network session.
> 
> Does it be a known issue?

That sounds like the COLO proxy is getting upset; it's supposed
to compare packets sent by the primary and secondary and only
send one to the outside - you shouldn't be talking directly to
the guest, but always via the proxy.  See docs/colo-proxy.txt

Dave

> Best Regard,
> Daniel Cho
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [RESEND PATCH v21 3/6] ACPI: Add APEI GHES table generation support

2019-11-27 Thread gengdongjiu

On 2019/11/25 17:48, Igor Mammedov wrote:
>>>..
>>> bios_linker_loader_add_pointer(linker, ACPI_BUILD_TABLE_FILE,
>>> ACPI_GHES_ERROR_STATUS_ADDRESS_OFFSET(hest_start, source_id),
>>> sizeof(uint64_t), ACPI_GHES_ERRORS_FW_CFG_FILE,
>>> source_id * sizeof(uint64_t));
>>>   ...
>>> }
>>>
>>> My previous series patch support 2 error sources, but now only enable 'SEA' 
>>> type Error Source  
>> I'd try to merge this, worry about extending things later.
>> This is at v21 and the simpler you can keep things,
>> the faster it'll go in.
> I don't think the series is ready for merging yet.
> It has a number of issues (not stylistic ones) that need to be fixed first.
> 
> As for extending, I think I've suggested to simplify series
> to account for single error source only in some places so it
> would be easier on author and reviewers and worry about extending
> it later.
sure, thanks for the review, we are preparing another series which will fix the 
issues that you mentioned.

>

Re: [PATCH for-5.0 v11 01/20] migration: Support QLIST migration

2019-11-27 Thread Dr. David Alan Gilbert

* Eric Auger (eric.au...@redhat.com) wrote:
> Support QLIST migration using the same principle as QTAILQ:
> 94869d5c52 ("migration: migrate QTAILQ").
> 
> The VMSTATE_QLIST_V macro has the same proto as VMSTATE_QTAILQ_V.
> The change mainly resides in QLIST RAW macros: QLIST_RAW_INSERT_HEAD
> and QLIST_RAW_REVERSE.
> 
> Tests also are provided.
> 
> Signed-off-by: Eric Auger 
> 
> ---
> 
> v5 - v6:
> - by doing more advanced testing with virtio-iommu migration
>   I noticed this was broken. "prev" field was not set properly.
>   I improved the tests to manipulate both the next and prev
>   fields.
> - Removed Peter and Juan's R-b
> ---
>  include/migration/vmstate.h |  21 +
>  include/qemu/queue.h|  39 +
>  migration/trace-events  |   5 ++
>  migration/vmstate-types.c   |  70 +++
>  tests/test-vmstate.c| 170 
>  5 files changed, 305 insertions(+)
> 
> diff --git a/include/migration/vmstate.h b/include/migration/vmstate.h
> index ac4f46a67d..08683d93c6 100644
> --- a/include/migration/vmstate.h
> +++ b/include/migration/vmstate.h
> @@ -227,6 +227,7 @@ extern const VMStateInfo vmstate_info_tmp;
>  extern const VMStateInfo vmstate_info_bitmap;
>  extern const VMStateInfo vmstate_info_qtailq;
>  extern const VMStateInfo vmstate_info_gtree;
> +extern const VMStateInfo vmstate_info_qlist;
>  
>  #define type_check_2darray(t1,t2,n,m) ((t1(*)[n][m])0 - (t2*)0)
>  /*
> @@ -796,6 +797,26 @@ extern const VMStateInfo vmstate_info_gtree;
>  .offset   = offsetof(_state, _field),
>   \
>  }
>  
> +/*
> + * For migrating a QLIST
> + * Target QLIST needs be properly initialized.
> + * _type: type of QLIST element
> + * _next: name of QLIST_ENTRY entry field in QLIST element
> + * _vmsd: VMSD for QLIST element
> + * size: size of QLIST element
> + * start: offset of QLIST_ENTRY in QTAILQ element
> + */
> +#define VMSTATE_QLIST_V(_field, _state, _version, _vmsd, _type, _next)  \
> +{\
> +.name = (stringify(_field)), \
> +.version_id   = (_version),  \
> +.vmsd = &(_vmsd),\
> +.size = sizeof(_type),   \
> +.info = &vmstate_info_qlist, \
> +.offset   = offsetof(_state, _field),\
> +.start= offsetof(_type, _next),  \
> +}
> +
>  /* _f : field name
> _f_n : num of elements field_name
> _n : num of elements
> diff --git a/include/qemu/queue.h b/include/qemu/queue.h
> index 4764d93ea3..4d4554a7ce 100644
> --- a/include/qemu/queue.h
> +++ b/include/qemu/queue.h
> @@ -501,4 +501,43 @@ union {  
>\
>  QTAILQ_RAW_TQH_CIRC(head)->tql_prev = QTAILQ_RAW_TQE_CIRC(elm, 
> entry);  \
>  } while (/*CONSTCOND*/0)
>  
> +#define QLIST_RAW_FIRST(head)
>   \
> +field_at_offset(head, 0, void *)
> +
> +#define QLIST_RAW_NEXT(elm, entry)   
>   \
> +field_at_offset(elm, entry, void *)
> +
> +#define QLIST_RAW_PREVIOUS(elm, entry)   
>   \
> +field_at_offset(elm, entry + sizeof(void *), void *)
> +
> +#define QLIST_RAW_FOREACH(elm, head, entry)  
>   \
> +for ((elm) = *QLIST_RAW_FIRST(head); 
>   \
> + (elm);  
>   \
> + (elm) = *QLIST_RAW_NEXT(elm, entry))
> +
> +#define QLIST_RAW_INSERT_HEAD(head, elm, entry) do { 
>   \
> +void *first = *QLIST_RAW_FIRST(head);
>   \
> +*QLIST_RAW_FIRST(head) = elm;
>   \
> +*QLIST_RAW_PREVIOUS(elm, entry) = QLIST_RAW_FIRST(head); 
>   \
> +if (first) { 
>   \
> +*QLIST_RAW_NEXT(elm, entry) = first; 
>   \
> +*QLIST_RAW_PREVIOUS(first, entry) = QLIST_RAW_NEXT(elm, entry);  
>   \
> +} else { 
>   \
> +*QLIST_RAW_NEXT(elm, entry) = NULL;  
>   \
> +}
>   \
> +} while (0)
> +
> +#define QLIST_RAW_REVERSE(head, elm, entry) do { 
>   \
> +void *iter = *QLIST_RAW_FIRST(head), *prev = NULL, *next;
>   \
> +while (iter) {

[PATCH 1/7] console: add graphic_hw_update_done()

2019-11-27 Thread Marc-André Lureau

Add a function to be called when a graphic update is done.

Declare the QXL renderer as async: render_update_cookie_num counts the
number of outstanding updates, and graphic_hw_update_done() is called
when it reaches none.

(note: this is preliminary work for asynchronous screendump support)

Signed-off-by: Marc-André Lureau 
Reviewed-by: Gerd Hoffmann 
---
 hw/display/qxl-render.c | 9 +++--
 hw/display/qxl.c| 1 +
 include/ui/console.h| 2 ++
 ui/console.c| 9 +
 4 files changed, 19 insertions(+), 2 deletions(-)

diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
index f7fdc4901e..3ce2e57b8f 100644
--- a/hw/display/qxl-render.c
+++ b/hw/display/qxl-render.c
@@ -109,7 +109,7 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
*qxl)
 qxl->guest_primary.surface.mem,
 MEMSLOT_GROUP_GUEST);
 if (!qxl->guest_primary.data) {
-return;
+goto end;
 }
 qxl_set_rect_to_surface(qxl, &qxl->dirty[0]);
 qxl->num_dirty_rects = 1;
@@ -137,7 +137,7 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
*qxl)
 }
 
 if (!qxl->guest_primary.data) {
-return;
+goto end;
 }
 for (i = 0; i < qxl->num_dirty_rects; i++) {
 if (qemu_spice_rect_is_empty(qxl->dirty+i)) {
@@ -158,6 +158,11 @@ static void qxl_render_update_area_unlocked(PCIQXLDevice 
*qxl)
qxl->dirty[i].bottom - qxl->dirty[i].top);
 }
 qxl->num_dirty_rects = 0;
+
+end:
+if (qxl->render_update_cookie_num == 0) {
+graphic_hw_update_done(qxl->ssd.dcl.con);
+}
 }
 
 /*
diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index cd7eb39d20..6d43b7433c 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -1181,6 +1181,7 @@ static const QXLInterface qxl_interface = {
 
 static const GraphicHwOps qxl_ops = {
 .gfx_update  = qxl_hw_update,
+.gfx_update_async = true,
 };
 
 static void qxl_enter_vga_mode(PCIQXLDevice *d)
diff --git a/include/ui/console.h b/include/ui/console.h
index f981696848..281f9c145b 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -365,6 +365,7 @@ static inline void console_write_ch(console_ch_t *dest, 
uint32_t ch)
 typedef struct GraphicHwOps {
 void (*invalidate)(void *opaque);
 void (*gfx_update)(void *opaque);
+bool gfx_update_async; /* if true, calls graphic_hw_update_done() */
 void (*text_update)(void *opaque, console_ch_t *text);
 void (*update_interval)(void *opaque, uint64_t interval);
 int (*ui_info)(void *opaque, uint32_t head, QemuUIInfo *info);
@@ -380,6 +381,7 @@ void graphic_console_set_hwops(QemuConsole *con,
 void graphic_console_close(QemuConsole *con);
 
 void graphic_hw_update(QemuConsole *con);
+void graphic_hw_update_done(QemuConsole *con);
 void graphic_hw_invalidate(QemuConsole *con);
 void graphic_hw_text_update(QemuConsole *con, console_ch_t *chardata);
 void graphic_hw_gl_block(QemuConsole *con, bool block);
diff --git a/ui/console.c b/ui/console.c
index 82d1ddac9c..3c941528d2 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -259,13 +259,22 @@ static void gui_setup_refresh(DisplayState *ds)
 ds->have_text = have_text;
 }
 
+void graphic_hw_update_done(QemuConsole *con)
+{
+}
+
 void graphic_hw_update(QemuConsole *con)
 {
+bool async = false;
 if (!con) {
 con = active_console;
 }
 if (con && con->hw_ops->gfx_update) {
 con->hw_ops->gfx_update(con->hw);
+async = con->hw_ops->gfx_update_async;
+}
+if (!async) {
+graphic_hw_update_done(con);
 }
 }
 
-- 
2.24.0

[PATCH 0/7] console: screendump improvements

2019-11-27 Thread Marc-André Lureau

Hi,

The following patches have been extracted from the "[PATCH v6 00/25]
monitor: add asynchronous command type", as they are
reviewable/mergeable independantly.

They introduce some internal API changes, and fix
qemu_open()/qemu_close()/unlink() misusages which should be quite
harmless.

Marc-André Lureau (7):
  console: add graphic_hw_update_done()
  ppm-save: pass opened fd
  ui: add pixman image g_autoptr support
  object: add g_autoptr support
  screendump: replace FILE with QIOChannel and fix close()/qemu_close()
  osdep: add qemu_unlink()
  screendump: use qemu_unlink()

 hw/display/qxl-render.c  |  9 +++--
 hw/display/qxl.c |  1 +
 include/qemu/osdep.h |  1 +
 include/qom/object.h |  3 ++
 include/ui/console.h |  2 ++
 include/ui/qemu-pixman.h |  2 ++
 ui/console.c | 74 +---
 ui/trace-events  |  2 +-
 util/osdep.c | 15 
 9 files changed, 71 insertions(+), 38 deletions(-)

-- 
2.24.0

[PATCH 2/7] ppm-save: pass opened fd

2019-11-27 Thread Marc-André Lureau

This will allow to pre-open the file before running the async finish
handler and avoid potential monitor fdset races.

(note: this is preliminary work for asynchronous screendump support)

Signed-off-by: Marc-André Lureau 
---
 ui/console.c| 45 ++---
 ui/trace-events |  2 +-
 2 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/ui/console.c b/ui/console.c
index 3c941528d2..77d62fe76d 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -193,6 +193,7 @@ static void dpy_refresh(DisplayState *s);
 static DisplayState *get_alloc_displaystate(void);
 static void text_console_update_cursor_timer(void);
 static void text_console_update_cursor(void *opaque);
+static bool ppm_save(int fd, DisplaySurface *ds, Error **errp);
 
 static void gui_update(void *opaque)
 {
@@ -308,29 +309,22 @@ void graphic_hw_invalidate(QemuConsole *con)
 }
 }
 
-static void ppm_save(const char *filename, DisplaySurface *ds,
- Error **errp)
+static bool ppm_save(int fd, DisplaySurface *ds, Error **errp)
 {
 int width = pixman_image_get_width(ds->image);
 int height = pixman_image_get_height(ds->image);
-int fd;
 FILE *f;
 int y;
 int ret;
 pixman_image_t *linebuf;
+bool success = false;
 
-trace_ppm_save(filename, ds);
-fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0666);
-if (fd == -1) {
-error_setg(errp, "failed to open file '%s': %s", filename,
-   strerror(errno));
-return;
-}
+trace_ppm_save(fd, ds);
 f = fdopen(fd, "wb");
 ret = fprintf(f, "P6\n%d %d\n%d\n", width, height, 255);
 if (ret < 0) {
 linebuf = NULL;
-goto write_err;
+goto end;
 }
 linebuf = qemu_pixman_linebuf_create(PIXMAN_BE_r8g8b8, width);
 for (y = 0; y < height; y++) {
@@ -339,21 +333,16 @@ static void ppm_save(const char *filename, DisplaySurface 
*ds,
 ret = fwrite(pixman_image_get_data(linebuf), 1,
  pixman_image_get_stride(linebuf), f);
 (void)ret;
-if (ferror(f)) {
-goto write_err;
-}
+success = !ferror(f);
 }
 
-out:
+end:
+if (!success) {
+error_setg(errp, "failed to write to PPM file: %s", strerror(errno));
+}
 qemu_pixman_image_unref(linebuf);
 fclose(f);
-return;
-
-write_err:
-error_setg(errp, "failed to write to file '%s': %s", filename,
-   strerror(errno));
-unlink(filename);
-goto out;
+return success;
 }
 
 void qmp_screendump(const char *filename, bool has_device, const char *device,
@@ -361,6 +350,7 @@ void qmp_screendump(const char *filename, bool has_device, 
const char *device,
 {
 QemuConsole *con;
 DisplaySurface *surface;
+int fd;
 
 if (has_device) {
 con = qemu_console_lookup_by_device_name(device, has_head ? head : 0,
@@ -387,7 +377,16 @@ void qmp_screendump(const char *filename, bool has_device, 
const char *device,
 return;
 }
 
-ppm_save(filename, surface, errp);
+fd = qemu_open(filename, O_WRONLY | O_CREAT | O_TRUNC | O_BINARY, 0666);
+if (fd == -1) {
+error_setg(errp, "failed to open file '%s': %s", filename,
+   strerror(errno));
+return;
+}
+
+if (!ppm_save(fd, surface, errp)) {
+unlink(filename);
+}
 }
 
 void graphic_hw_text_update(QemuConsole *con, console_ch_t *chardata)
diff --git a/ui/trace-events b/ui/trace-events
index 63de72a798..0dcda393c1 100644
--- a/ui/trace-events
+++ b/ui/trace-events
@@ -15,7 +15,7 @@ displaysurface_create_pixman(void *display_surface) 
"surface=%p"
 displaysurface_free(void *display_surface) "surface=%p"
 displaychangelistener_register(void *dcl, const char *name) "%p [ %s ]"
 displaychangelistener_unregister(void *dcl, const char *name) "%p [ %s ]"
-ppm_save(const char *filename, void *display_surface) "%s surface=%p"
+ppm_save(int fd, void *display_surface) "fd=%d surface=%p"
 
 # gtk.c
 # gtk-gl-area.c
-- 
2.24.0

[PATCH 7/7] screendump: use qemu_unlink()

2019-11-27 Thread Marc-André Lureau

Don't attempt to remove /dev/fdset files.

Signed-off-by: Marc-André Lureau 
---
 ui/console.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ui/console.c b/ui/console.c
index 587edf4ed4..e6ac462aa0 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -381,7 +381,7 @@ void qmp_screendump(const char *filename, bool has_device, 
const char *device,
 }
 
 if (!ppm_save(fd, surface, errp)) {
-unlink(filename);
+qemu_unlink(filename);
 }
 }
 
-- 
2.24.0

[PATCH 5/7] screendump: replace FILE with QIOChannel and fix close()/qemu_close()

2019-11-27 Thread Marc-André Lureau

The file opened for ppm_save() may be a /dev/fdset, in which case a
dup fd is added to the fdset. It should be removed by calling
qemu_close(), instead of the implicit close() on fclose().

I don't see a convenient way to solve that with stdio streams, so I
switched the code to QIOChannel which uses qemu_close().

Signed-off-by: Marc-André Lureau 
---
 ui/console.c | 38 +-
 1 file changed, 17 insertions(+), 21 deletions(-)

diff --git a/ui/console.c b/ui/console.c
index 77d62fe76d..587edf4ed4 100644
--- a/ui/console.c
+++ b/ui/console.c
@@ -33,6 +33,7 @@
 #include "chardev/char-fe.h"
 #include "trace.h"
 #include "exec/memory.h"
+#include "io/channel-file.h"
 
 #define DEFAULT_BACKSCROLL 512
 #define CONSOLE_CURSOR_PERIOD 500
@@ -313,36 +314,31 @@ static bool ppm_save(int fd, DisplaySurface *ds, Error 
**errp)
 {
 int width = pixman_image_get_width(ds->image);
 int height = pixman_image_get_height(ds->image);
-FILE *f;
+g_autoptr(Object) ioc = OBJECT(qio_channel_file_new_fd(fd));
+g_autofree char *header = NULL;
+g_autoptr(pixman_image_t) linebuf = NULL;
+g_autoptr(GError) error = NULL;
 int y;
-int ret;
-pixman_image_t *linebuf;
-bool success = false;
 
 trace_ppm_save(fd, ds);
-f = fdopen(fd, "wb");
-ret = fprintf(f, "P6\n%d %d\n%d\n", width, height, 255);
-if (ret < 0) {
-linebuf = NULL;
-goto end;
+
+header = g_strdup_printf("P6\n%d %d\n%d\n", width, height, 255);
+if (qio_channel_write_all(QIO_CHANNEL(ioc),
+  header, strlen(header), errp) < 0) {
+return false;
 }
+
 linebuf = qemu_pixman_linebuf_create(PIXMAN_BE_r8g8b8, width);
 for (y = 0; y < height; y++) {
 qemu_pixman_linebuf_fill(linebuf, ds->image, width, 0, y);
-clearerr(f);
-ret = fwrite(pixman_image_get_data(linebuf), 1,
- pixman_image_get_stride(linebuf), f);
-(void)ret;
-success = !ferror(f);
+if (qio_channel_write_all(QIO_CHANNEL(ioc),
+  (char *)pixman_image_get_data(linebuf),
+  pixman_image_get_stride(linebuf), errp) < 0) 
{
+return false;
+}
 }
 
-end:
-if (!success) {
-error_setg(errp, "failed to write to PPM file: %s", strerror(errno));
-}
-qemu_pixman_image_unref(linebuf);
-fclose(f);
-return success;
+return true;
 }
 
 void qmp_screendump(const char *filename, bool has_device, const char *device,
-- 
2.24.0

[PATCH 4/7] object: add g_autoptr support

2019-11-27 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/qom/object.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index 128d00c77f..f96a44be64 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1747,4 +1747,7 @@ Object *container_get(Object *root, const char *path);
  * Returns the instance_size of the given @typename.
  */
 size_t object_type_get_instance_size(const char *typename);
+
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(Object, object_unref)
+
 #endif
-- 
2.24.0

[PATCH 3/7] ui: add pixman image g_autoptr support

2019-11-27 Thread Marc-André Lureau

Signed-off-by: Marc-André Lureau 
---
 include/ui/qemu-pixman.h | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/include/ui/qemu-pixman.h b/include/ui/qemu-pixman.h
index 0668109305..3b7cf70157 100644
--- a/include/ui/qemu-pixman.h
+++ b/include/ui/qemu-pixman.h
@@ -90,4 +90,6 @@ void qemu_pixman_glyph_render(pixman_image_t *glyph,
   pixman_color_t *bgcol,
   int x, int y, int cw, int ch);
 
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(pixman_image_t, qemu_pixman_image_unref)
+
 #endif /* QEMU_PIXMAN_H */
-- 
2.24.0

[PATCH 6/7] osdep: add qemu_unlink()

2019-11-27 Thread Marc-André Lureau

Add a helper function to match qemu_open() which may return files
under the /dev/fdset prefix. Those shouldn't be removed, since it's
only a qemu namespace.

Signed-off-by: Marc-André Lureau 
---
 include/qemu/osdep.h |  1 +
 util/osdep.c | 15 +++
 2 files changed, 16 insertions(+)

diff --git a/include/qemu/osdep.h b/include/qemu/osdep.h
index 0f97d68586..9bd3dcfd13 100644
--- a/include/qemu/osdep.h
+++ b/include/qemu/osdep.h
@@ -462,6 +462,7 @@ int qemu_mprotect_none(void *addr, size_t size);
 
 int qemu_open(const char *name, int flags, ...);
 int qemu_close(int fd);
+int qemu_unlink(const char *name);
 #ifndef _WIN32
 int qemu_dup(int fd);
 #endif
diff --git a/util/osdep.c b/util/osdep.c
index 3f04326040..f7d06050f7 100644
--- a/util/osdep.c
+++ b/util/osdep.c
@@ -370,6 +370,21 @@ int qemu_close(int fd)
 return close(fd);
 }
 
+/*
+ * Delete a file from the filesystem, unless the filename is /dev/fdset/...
+ *
+ * Returns: On success, zero is returned.  On error, -1 is returned,
+ * and errno is set appropriately.
+ */
+int qemu_unlink(const char *name)
+{
+if (g_str_has_prefix(name, "/dev/fdset/")) {
+return 0;
+}
+
+return unlink(name);
+}
+
 /*
  * A variant of write(2) which handles partial write.
  *
-- 
2.24.0

[PATCH] iotests/273: Filter format-specific information

2019-11-27 Thread Max Reitz

Doing this allows running this test with e.g. -o compat=0.10 or
-o compat=refcount_bits=1.

Signed-off-by: Max Reitz 
---
 tests/qemu-iotests/273 |  3 ++-
 tests/qemu-iotests/273.out | 27 ---
 2 files changed, 2 insertions(+), 28 deletions(-)

diff --git a/tests/qemu-iotests/273 b/tests/qemu-iotests/273
index 98a672516d..d598c47d9b 100755
--- a/tests/qemu-iotests/273
+++ b/tests/qemu-iotests/273
@@ -48,7 +48,8 @@ do_run_qemu()
 run_qemu()
 {
 do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | _filter_qmp |
-_filter_generated_node_ids | _filter_imgfmt | _filter_actual_image_size
+_filter_generated_node_ids | _filter_imgfmt |
+_filter_actual_image_size | _filter_img_info
 }
 
 TEST_IMG="$TEST_IMG.base" _make_test_img 64M
diff --git a/tests/qemu-iotests/273.out b/tests/qemu-iotests/273.out
index c410fee5c4..684b8d6f77 100644
--- a/tests/qemu-iotests/273.out
+++ b/tests/qemu-iotests/273.out
@@ -38,15 +38,6 @@ Testing: -blockdev 
file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
 "cluster-size": 65536,
 "format": "IMGFMT",
 "actual-size": SIZE,
-"format-specific": {
-"type": "IMGFMT",
-"data": {
-"compat": "1.1",
-"lazy-refcounts": false,
-"refcount-bits": 16,
-"corrupt": false
-}
-},
 "full-backing-filename": "TEST_DIR/t.IMGFMT.base",
 "backing-filename": "TEST_DIR/t.IMGFMT.base",
 "dirty-flag": false
@@ -57,15 +48,6 @@ Testing: -blockdev 
file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
 "cluster-size": 65536,
 "format": "IMGFMT",
 "actual-size": SIZE,
-"format-specific": {
-"type": "IMGFMT",
-"data": {
-"compat": "1.1",
-"lazy-refcounts": false,
-"refcount-bits": 16,
-"corrupt": false
-}
-},
 "full-backing-filename": "TEST_DIR/t.IMGFMT.mid",
 "backing-filename": "TEST_DIR/t.IMGFMT.mid",
 "dirty-flag": false
@@ -136,15 +118,6 @@ Testing: -blockdev 
file,node-name=base,filename=TEST_DIR/t.IMGFMT.base -blockdev
 "cluster-size": 65536,
 "format": "IMGFMT",
 "actual-size": SIZE,
-"format-specific": {
-"type": "IMGFMT",
-"data": {
-"compat": "1.1",
-"lazy-refcounts": false,
-"refcount-bits": 16,
-"corrupt": false
-}
-},
 "full-backing-filename": "TEST_DIR/t.IMGFMT.base",
 "backing-filename": "TEST_DIR/t.IMGFMT.base",
 "dirty-flag": false
-- 
2.23.0

Re: [PATCH for-5.0 v11 18/20] virtio-iommu: Support migration

2019-11-27 Thread Dr. David Alan Gilbert

* Eric Auger (eric.au...@redhat.com) wrote:
> Add Migration support. We rely on recently added gtree and qlist
> migration. Besides, we have to fixup end point <-> domain link.
> 
> Indeed each domain has a list of endpoints attached to it. And each
> endpoint has a pointer to its domain.
> 
> Raw gtree and qlist migration cannot handle this as it re-allocates
> all the nodes while reconstructing the trees/lists.
> 
> So in post_load we re-construct the relationship between endpoints
> and domains.
> 
> Signed-off-by: Eric Auger 

>From the migration side of things,


Acked-by: Dr. David Alan Gilbert 

> ---
>  hw/virtio/virtio-iommu.c | 127 ---
>  1 file changed, 117 insertions(+), 10 deletions(-)
> 
> diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
> index c5b202fab7..4e92fc0c95 100644
> --- a/hw/virtio/virtio-iommu.c
> +++ b/hw/virtio/virtio-iommu.c
> @@ -692,16 +692,6 @@ static void virtio_iommu_set_features(VirtIODevice 
> *vdev, uint64_t val)
>  trace_virtio_iommu_set_features(dev->acked_features);
>  }
>  
> -/*
> - * Migration is not yet supported: most of the state consists
> - * of balanced binary trees which are not yet ready for getting
> - * migrated
> - */
> -static const VMStateDescription vmstate_virtio_iommu_device = {
> -.name = "virtio-iommu-device",
> -.unmigratable = 1,
> -};
> -
>  static gint int_cmp(gconstpointer a, gconstpointer b, gpointer user_data)
>  {
>  uint ua = GPOINTER_TO_UINT(a);
> @@ -778,6 +768,123 @@ static void virtio_iommu_instance_init(Object *obj)
>  {
>  }
>  
> +#define VMSTATE_INTERVAL   \
> +{  \
> +.name = "interval",\
> +.version_id = 1,   \
> +.minimum_version_id = 1,   \
> +.fields = (VMStateField[]) {   \
> +VMSTATE_UINT64(low, viommu_interval),  \
> +VMSTATE_UINT64(high, viommu_interval), \
> +VMSTATE_END_OF_LIST()  \
> +}  \
> +}
> +
> +#define VMSTATE_MAPPING   \
> +{ \
> +.name = "mapping",\
> +.version_id = 1,  \
> +.minimum_version_id = 1,  \
> +.fields = (VMStateField[]) {  \
> +VMSTATE_UINT64(phys_addr, viommu_mapping),\
> +VMSTATE_UINT32(flags, viommu_mapping),\
> +VMSTATE_END_OF_LIST() \
> +},\
> +}
> +
> +static const VMStateDescription vmstate_interval_mapping[2] = {
> +VMSTATE_MAPPING,   /* value */
> +VMSTATE_INTERVAL   /* key   */
> +};
> +
> +static int domain_preload(void *opaque)
> +{
> +viommu_domain *domain = opaque;
> +
> +domain->mappings = g_tree_new_full((GCompareDataFunc)interval_cmp,
> +   NULL, g_free, g_free);
> +return 0;
> +}
> +
> +static const VMStateDescription vmstate_endpoint = {
> +.name = "endpoint",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(id, viommu_endpoint),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static const VMStateDescription vmstate_domain = {
> +.name = "domain",
> +.version_id = 1,
> +.minimum_version_id = 1,
> +.pre_load = domain_preload,
> +.fields = (VMStateField[]) {
> +VMSTATE_UINT32(id, viommu_domain),
> +VMSTATE_GTREE_V(mappings, viommu_domain, 1,
> +vmstate_interval_mapping,
> +viommu_interval, viommu_mapping),
> +VMSTATE_QLIST_V(endpoint_list, viommu_domain, 1,
> +vmstate_endpoint, viommu_endpoint, next),
> +VMSTATE_END_OF_LIST()
> +}
> +};
> +
> +static gboolean reconstruct_ep_domain_link(gpointer key, gpointer value,
> +   gpointer data)
> +{
> +viommu_domain *d = (viommu_domain *)value;
> +viommu_endpoint *iter, *tmp;
> +viommu_endpoint *ep = (viommu_endpoint *)data;
> +
> +QLIST_FOREACH_SAFE(iter, &d->endpoint_list, next, tmp) {
> +if (iter->id == ep->id) {
> +/* remove the ep */
> +QLIST_REMOVE(iter, next);
> +g_free(iter);
> +/* replace it with the good one */
> +QLIST_INSERT_HEAD(&d->endpoint_list, ep, next);
> +/* update the domain */
> +ep->domain = d;
> +return true; /* stop the search */
> +}
> +}
> +return false; /* continue the traversal */
> +}
> +
> +static gboolean fix_endpoint(gpointer key, gpointer value, gpointer data)
> +{
> +VirtIOIOMMU *

Re: [PATCH v4 18/37] mips: baudbase is 115200 by default

2019-11-27 Thread Marc-André Lureau

Hi

On Mon, Nov 25, 2019 at 5:04 PM Philippe Mathieu-Daudé
 wrote:
>
> On 11/25/19 1:54 PM, Philippe Mathieu-Daudé wrote:
> > On 11/25/19 12:26 PM, Philippe Mathieu-Daudé wrote:
> >> On 11/25/19 11:12 AM, Marc-André Lureau wrote:
> >>> Hi
> >>>
> >>> On Mon, Nov 25, 2019 at 2:07 PM Aleksandar Markovic
> >>>  wrote:
> 
> 
> 
>  On Wednesday, November 20, 2019, Marc-André Lureau
>   wrote:
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >   hw/mips/mips_mipssim.c | 1 -
> >   1 file changed, 1 deletion(-)
> >
> > diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c
> > index bfafa4d7e9..3cd0e6eb33 100644
> > --- a/hw/mips/mips_mipssim.c
> > +++ b/hw/mips/mips_mipssim.c
> > @@ -223,7 +223,6 @@ mips_mipssim_init(MachineState *machine)
> >   if (serial_hd(0)) {
> >   DeviceState *dev = qdev_create(NULL, TYPE_SERIAL_IO);
> >
> > -qdev_prop_set_uint32(DEVICE(dev), "baudbase", 115200);
> >   qdev_prop_set_chr(dev, "chardev", serial_hd(0));
> >   qdev_set_legacy_instance_id(dev, 0x3f8, 2);
> >   qdev_init_nofail(dev);
> > --
> 
> 
>  Please mention in your commit message where the default baudbase is
>  set.
> >>>
> >>> ok
> >>>
>  Also, is there a guarantie that default value 115200 will never
>  change in future?
> >>>
> >>> The level of stability on properties in general is unclear to me.
> >>>
> >>> Given that 115200 is standard for serial, it is unlikely to change
> >>> though.. We can have an assert there instead?
> >>>
> >>> Peter, what do you think? thanks
>
> IOW, until we merge Damien's "Clock framework API" series, I'd:
>
> - rename 'baudbase' -> 'input_frequency_hz'
>
> - set a 0 default value
>
>   DEFINE_PROP_UINT32("input-frequency-hz", SerialState,
>   input_frequency_hz, 0),
>
> - add a check in serial_realize()
>
>  if (s->input_frequency_hz == 0) {
>  error_setg(errp,
>"serial: input-frequency-hz property must be set");
>  return;
>  }
>
> [*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg642174.html
>

This is getting further away from this series goal, and my initial
goal. Let's add this to the backlog. I can drop a FIXME there.

> >> This property confused me by the past. It is _not_ the baudrate.
> >> It is the input frequency clocking the UART ('XIN' pin, Xtal INput).
> >>
> >> Each board has its own frequency, and it can even be variable (the
> >> clock domain tree can reconfigure it at a different rate).
> >
> > Laurent pointed me to the following commit which confirms my
> > interpretation:
> >
> > $ git show 038eaf82c853
> > commit 038eaf82c853f3bf8d4c106c0677bbf4adada7de
> > Author: Stefan Weil 
> > Date:   Sat Oct 31 11:28:11 2009 +0100
> >
> >  serial: Add interface to set reference oscillator frequency
> >
> >  Many (most?) serial interfaces have a programmable
> >  clock which provides the reference frequency ("baudbase").
> >  So a fixed baudbase which is only set once can be wrong.
> >
> >  omap1.c is an example which could use the new interface
> >  to change baudbase when the programmable clock changes.
> >  ar7 system emulation (still not part of standard QEMU)
> >  is similar to omap and already uses serial_set_frequency.
> >
> >  Signed-off-by: Stefan Weil 
> >  Signed-off-by: Anthony Liguori 
> >
> > diff --git a/hw/pc.h b/hw/pc.h
> > index 15fff8d103..03ffc91536 100644
> > --- a/hw/pc.h
> > +++ b/hw/pc.h
> > @@ -13,6 +13,7 @@ SerialState *serial_mm_init (target_phys_addr_t base,
> > int it_shift,
> >qemu_irq irq, int baudbase,
> >CharDriverState *chr, int ioregister);
> >   SerialState *serial_isa_init(int index, CharDriverState *chr);
> > +void serial_set_frequency(SerialState *s, uint32_t frequency);
> >
> >   /* parallel.c */
> >
> > diff --git a/hw/serial.c b/hw/serial.c
> > index fa12dcc075..0063260569 100644
> > --- a/hw/serial.c
> > +++ b/hw/serial.c
> > @@ -730,6 +730,13 @@ static void serial_init_core(SerialState *s)
> > serial_event, s);
> >   }
> >
> > +/* Change the main reference oscillator frequency. */
> > +void serial_set_frequency(SerialState *s, uint32_t frequency)
> > +{
> > +s->baudbase = frequency;
> > +serial_update_parameters(s);
> > +}
> > +
>

Re: [PATCH v4 18/37] mips: baudbase is 115200 by default

2019-11-27 Thread Aleksandar Markovic

On Wednesday, November 27, 2019, Marc-André Lureau <
marcandre.lur...@redhat.com> wrote:

> Hi
>
> On Mon, Nov 25, 2019 at 5:04 PM Philippe Mathieu-Daudé
>  wrote:
> >
> > On 11/25/19 1:54 PM, Philippe Mathieu-Daudé wrote:
> > > On 11/25/19 12:26 PM, Philippe Mathieu-Daudé wrote:
> > >> On 11/25/19 11:12 AM, Marc-André Lureau wrote:
> > >>> Hi
> > >>>
> > >>> On Mon, Nov 25, 2019 at 2:07 PM Aleksandar Markovic
> > >>>  wrote:
> > 
> > 
> > 
> >  On Wednesday, November 20, 2019, Marc-André Lureau
> >   wrote:
> > >
> > > Signed-off-by: Marc-André Lureau 
> > > ---
> > >   hw/mips/mips_mipssim.c | 1 -
> > >   1 file changed, 1 deletion(-)
> > >
> > > diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c
> > > index bfafa4d7e9..3cd0e6eb33 100644
> > > --- a/hw/mips/mips_mipssim.c
> > > +++ b/hw/mips/mips_mipssim.c
> > > @@ -223,7 +223,6 @@ mips_mipssim_init(MachineState *machine)
> > >   if (serial_hd(0)) {
> > >   DeviceState *dev = qdev_create(NULL, TYPE_SERIAL_IO);
> > >
> > > -qdev_prop_set_uint32(DEVICE(dev), "baudbase", 115200);
> > >   qdev_prop_set_chr(dev, "chardev", serial_hd(0));
> > >   qdev_set_legacy_instance_id(dev, 0x3f8, 2);
> > >   qdev_init_nofail(dev);
> > > --
> > 
> > 
> >  Please mention in your commit message where the default baudbase is
> >  set.
> > >>>
> > >>> ok
> > >>>
> >  Also, is there a guarantie that default value 115200 will never
> >  change in future?
> > >>>
> > >>> The level of stability on properties in general is unclear to me.
> > >>>
> > >>> Given that 115200 is standard for serial, it is unlikely to change
> > >>> though.. We can have an assert there instead?
> > >>>
> > >>> Peter, what do you think? thanks
> >
> > IOW, until we merge Damien's "Clock framework API" series, I'd:
> >
> > - rename 'baudbase' -> 'input_frequency_hz'
> >
> > - set a 0 default value
> >
> >   DEFINE_PROP_UINT32("input-frequency-hz", SerialState,
> >   input_frequency_hz, 0),
> >
> > - add a check in serial_realize()
> >
> >  if (s->input_frequency_hz == 0) {
> >  error_setg(errp,
> >"serial: input-frequency-hz property must be set");
> >  return;
> >  }
> >
> > [*] https://www.mail-archive.com/qemu-devel@nongnu.org/msg642174.html
> >
>
> This is getting further away from this series goal, and my initial
> goal. Let's add this to the backlog. I can drop a FIXME there.


I agree. But please update commit message and/or add FIXME so that future
readers are given at least some background.

Reviewed-by: Aleksandar Markovic 


>
> > >> This property confused me by the past. It is _not_ the baudrate.
> > >> It is the input frequency clocking the UART ('XIN' pin, Xtal INput).
> > >>
> > >> Each board has its own frequency, and it can even be variable (the
> > >> clock domain tree can reconfigure it at a different rate).
> > >
> > > Laurent pointed me to the following commit which confirms my
> > > interpretation:
> > >
> > > $ git show 038eaf82c853
> > > commit 038eaf82c853f3bf8d4c106c0677bbf4adada7de
> > > Author: Stefan Weil 
> > > Date:   Sat Oct 31 11:28:11 2009 +0100
> > >
> > >  serial: Add interface to set reference oscillator frequency
> > >
> > >  Many (most?) serial interfaces have a programmable
> > >  clock which provides the reference frequency ("baudbase").
> > >  So a fixed baudbase which is only set once can be wrong.
> > >
> > >  omap1.c is an example which could use the new interface
> > >  to change baudbase when the programmable clock changes.
> > >  ar7 system emulation (still not part of standard QEMU)
> > >  is similar to omap and already uses serial_set_frequency.
> > >
> > >  Signed-off-by: Stefan Weil 
> > >  Signed-off-by: Anthony Liguori 
> > >
> > > diff --git a/hw/pc.h b/hw/pc.h
> > > index 15fff8d103..03ffc91536 100644
> > > --- a/hw/pc.h
> > > +++ b/hw/pc.h
> > > @@ -13,6 +13,7 @@ SerialState *serial_mm_init (target_phys_addr_t base,
> > > int it_shift,
> > >qemu_irq irq, int baudbase,
> > >CharDriverState *chr, int ioregister);
> > >   SerialState *serial_isa_init(int index, CharDriverState *chr);
> > > +void serial_set_frequency(SerialState *s, uint32_t frequency);
> > >
> > >   /* parallel.c */
> > >
> > > diff --git a/hw/serial.c b/hw/serial.c
> > > index fa12dcc075..0063260569 100644
> > > --- a/hw/serial.c
> > > +++ b/hw/serial.c
> > > @@ -730,6 +730,13 @@ static void serial_init_core(SerialState *s)
> > > serial_event, s);
> > >   }
> > >
> > > +/* Change the main reference oscillator frequency. */
> > > +void serial_set_frequency(SerialState *s, uint32_t frequency)
> > > +{
> > > +s->baudbase = frequency;
> > > +serial_update_parameters(s);
> > > +}
> > > +
> >
>
>

Re: [PATCH v4 17/37] mips: inline serial_init()

2019-11-27 Thread Aleksandar Markovic

On Wednesday, November 20, 2019, Marc-André Lureau <
marcandre.lur...@redhat.com> wrote:

> The function is specific to mipssim, let's inline it.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/char/serial.c | 16 
>  hw/mips/mips_mipssim.c   | 15 ---
>  include/hw/char/serial.h |  2 --
>  3 files changed, 12 insertions(+), 21 deletions(-)
>
>
Reviewed-by: Aleksandar Markovic 


> diff --git a/hw/char/serial.c b/hw/char/serial.c
> index 164146ede8..23f0b02516 100644
> --- a/hw/char/serial.c
> +++ b/hw/char/serial.c
> @@ -1023,22 +1023,6 @@ static const TypeInfo serial_io_info = {
>  .class_init = serial_io_class_init,
>  };
>
> -SerialIO *serial_init(int base, qemu_irq irq, int baudbase,
> - Chardev *chr, MemoryRegion *system_io)
> -{
> -SerialIO *sio = SERIAL_IO(qdev_create(NULL, TYPE_SERIAL_IO));
> -
> -qdev_prop_set_uint32(DEVICE(sio), "baudbase", baudbase);
> -qdev_prop_set_chr(DEVICE(sio), "chardev", chr);
> -qdev_set_legacy_instance_id(DEVICE(sio), base, 2);
> -qdev_init_nofail(DEVICE(sio));
> -
> -sysbus_connect_irq(SYS_BUS_DEVICE(sio), 0, irq);
> -memory_region_add_subregion(system_io, base, &sio->serial.io);
> -
> -return sio;
> -}
> -
>  static Property serial_properties[] = {
>  DEFINE_PROP_CHR("chardev", SerialState, chr),
>  DEFINE_PROP_UINT32("baudbase", SerialState, baudbase, 115200),
> diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c
> index 282bbecb24..bfafa4d7e9 100644
> --- a/hw/mips/mips_mipssim.c
> +++ b/hw/mips/mips_mipssim.c
> @@ -40,6 +40,7 @@
>  #include "hw/loader.h"
>  #include "elf.h"
>  #include "hw/sysbus.h"
> +#include "hw/qdev-properties.h"
>  #include "exec/address-spaces.h"
>  #include "qemu/error-report.h"
>  #include "sysemu/qtest.h"
> @@ -219,9 +220,17 @@ mips_mipssim_init(MachineState *machine)
>   * A single 16450 sits at offset 0x3f8. It is attached to
>   * MIPS CPU INT2, which is interrupt 4.
>   */
> -if (serial_hd(0))
> -serial_init(0x3f8, env->irq[4], 115200, serial_hd(0),
> -get_system_io());
> +if (serial_hd(0)) {
> +DeviceState *dev = qdev_create(NULL, TYPE_SERIAL_IO);
> +
> +qdev_prop_set_uint32(DEVICE(dev), "baudbase", 115200);
> +qdev_prop_set_chr(dev, "chardev", serial_hd(0));
> +qdev_set_legacy_instance_id(dev, 0x3f8, 2);
> +qdev_init_nofail(dev);
> +sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, env->irq[4]);
> +memory_region_add_subregion(get_system_io(), 0x3f8,
> +&SERIAL_IO(dev)->serial.io);
> +}
>
>  if (nd_table[0].used)
>  /* MIPSnet uses the MIPS CPU INT0, which is interrupt 2. */
> diff --git a/include/hw/char/serial.h b/include/hw/char/serial.h
> index d356ba838c..535fa23a2b 100644
> --- a/include/hw/char/serial.h
> +++ b/include/hw/char/serial.h
> @@ -108,8 +108,6 @@ void serial_set_frequency(SerialState *s, uint32_t
> frequency);
>  #define TYPE_SERIAL_IO "serial-io"
>  #define SERIAL_IO(s) OBJECT_CHECK(SerialIO, (s), TYPE_SERIAL_IO)
>
> -SerialIO *serial_init(int base, qemu_irq irq, int baudbase,
> -  Chardev *chr, MemoryRegion *system_io);
>  SerialMM *serial_mm_init(MemoryRegion *address_space,
>   hwaddr base, int regshift,
>   qemu_irq irq, int baudbase,
> --
> 2.24.0
>
>
>

Re: [PATCH v4 20/37] mips: use sysbus_mmio_get_region() instead of internal fields

2019-11-27 Thread Aleksandar Markovic

On Wednesday, November 20, 2019, Marc-André Lureau <
marcandre.lur...@redhat.com> wrote:

> Register the memory region with sysbus_init_mmio() and look it up with
> sysbus_mmio_get_region() to avoid accessing internal device fields.
>
> Suggested-by: Peter Maydell 
> Signed-off-by: Marc-André Lureau 
> ---
>  hw/char/serial.c   | 1 +
>  hw/mips/mips_mipssim.c | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
>
>
Reviewed-by: Aleksandar Markovic 


> diff --git a/hw/char/serial.c b/hw/char/serial.c
> index 23f0b02516..02c545ff8c 100644
> --- a/hw/char/serial.c
> +++ b/hw/char/serial.c
> @@ -993,6 +993,7 @@ static void serial_io_realize(DeviceState *dev, Error
> **errp)
>  qdev_init_nofail(DEVICE(s));
>
>  memory_region_init_io(&s->io, NULL, &serial_io_ops, s, "serial", 8);
> +sysbus_init_mmio(SYS_BUS_DEVICE(sio), &s->io);
>  sysbus_init_irq(SYS_BUS_DEVICE(sio), &s->irq);
>  }
>
> diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c
> index 2c2c7f25b2..84c03dd035 100644
> --- a/hw/mips/mips_mipssim.c
> +++ b/hw/mips/mips_mipssim.c
> @@ -227,7 +227,8 @@ mips_mipssim_init(MachineState *machine)
>  qdev_set_legacy_instance_id(dev, 0x3f8, 2);
>  qdev_init_nofail(dev);
>  sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, env->irq[4]);
> -sysbus_add_io(SYS_BUS_DEVICE(dev), 0x3f8, &SERIAL_IO(dev)->
> serial.io);
> +sysbus_add_io(SYS_BUS_DEVICE(dev), 0x3f8,
> +  sysbus_mmio_get_region(SYS_BUS_DEVICE(dev), 0));
>  }
>
>  if (nd_table[0].used)
> --
> 2.24.0
>
>
>

Re: [PATCH 0/7] console: screendump improvements

2019-11-27 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/20191127115202.375107-1-marcandre.lur...@redhat.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH 0/7] console: screendump improvements
Type: series
Message-id: 20191127115202.375107-1-marcandre.lur...@redhat.com

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
fatal: git fetch_pack: expected ACK/NAK, got 'ERR upload-pack: not our ref 
4aa8cc3583eabafc01c50020f7887c3509a8f0c4'
fatal: The remote end hung up unexpectedly
error: Could not fetch 3c8cf5a9c21ff8782164d1def7f44bd888713384
Traceback (most recent call last):
  File "patchew-tester/src/patchew-cli", line 521, in test_one
git_clone_repo(clone, r["repo"], r["head"], logf, True)
  File "patchew-tester/src/patchew-cli", line 48, in git_clone_repo
stdout=logf, stderr=logf)
  File "/opt/rh/rh-python36/root/usr/lib64/python3.6/subprocess.py", line 291, 
in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['git', 'remote', 'add', '-f', 
'--mirror=fetch', '3c8cf5a9c21ff8782164d1def7f44bd888713384', 
'https://github.com/patchew-project/qemu']' returned non-zero exit status 1.



The full log is available at
http://patchew.org/logs/20191127115202.375107-1-marcandre.lur...@redhat.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v4 19/37] mips: use sysbus_add_io()

2019-11-27 Thread Aleksandar Markovic

On Wednesday, November 20, 2019, Marc-André Lureau <
marcandre.lur...@redhat.com> wrote:

> Signed-off-by: Marc-André Lureau 
> ---
>  hw/mips/mips_mipssim.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>
>
I agree with the change, and with overall series in general, but please add
at least a sentence in the commit message here, explaining what is achieved
by the change.

With that sentence, certainly:

Reviewed-by: Aleksandar Markovic 


> diff --git a/hw/mips/mips_mipssim.c b/hw/mips/mips_mipssim.c
> index 3cd0e6eb33..2c2c7f25b2 100644
> --- a/hw/mips/mips_mipssim.c
> +++ b/hw/mips/mips_mipssim.c
> @@ -227,8 +227,7 @@ mips_mipssim_init(MachineState *machine)
>  qdev_set_legacy_instance_id(dev, 0x3f8, 2);
>  qdev_init_nofail(dev);
>  sysbus_connect_irq(SYS_BUS_DEVICE(dev), 0, env->irq[4]);
> -memory_region_add_subregion(get_system_io(), 0x3f8,
> -&SERIAL_IO(dev)->serial.io);
> +sysbus_add_io(SYS_BUS_DEVICE(dev), 0x3f8, &SERIAL_IO(dev)->
> serial.io);
>  }
>
>  if (nd_table[0].used)
> --
> 2.24.0
>
>
>

Re: [PATCH V2] throttle-groups: fix memory leak in throttle_group_set_limit:

2019-11-27 Thread Max Reitz

On 27.11.19 07:20, pannengy...@huawei.com wrote:
> From: PanNengyuan 
> 
> This avoid a memory leak when qom-set is called to set throttle_group
> limits, here is an easy way to reproduce:
> 
> 1. run qemu-iotests as follow and check the result with asan:
>./check -qcow2 184
> 
> Following is the asan output backtrack:
> Direct leak of 912 byte(s) in 3 object(s) allocated from:
> #0 0x8d7ab3c3 in __interceptor_calloc   (/lib64/libasan.so.4+0xd33c3)
> #1 0x8d4c31cb in g_malloc0   (/lib64/libglib-2.0.so.0+0x571cb)
> #2 0x190c857 in qobject_input_start_struct  
> /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qobject-input-visitor.c:295
> #3 0x19070df in visit_start_struct   
> /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qapi/qapi-visit-core.c:49
> #4 0x1948b87 in visit_type_ThrottleLimits   
> qapi/qapi-visit-block-core.c:3759
> #5 0x17e4aa3 in throttle_group_set_limits   
> /mnt/sdc/qemu-master/qemu-4.2.0-rc0/block/throttle-groups.c:900
> #6 0x1650eff in object_property_set 
> /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/object.c:1272
> #7 0x1658517 in object_property_set_qobject   
> /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qobject.c:26
> #8 0x15880bb in qmp_qom_set  
> /mnt/sdc/qemu-master/qemu-4.2.0-rc0/qom/qom-qmp-cmds.c:74
> #9 0x157e3e3 in qmp_marshal_qom_set  qapi/qapi-commands-qom.c:154
> 
> Reported-by: Euler Robot 
> Signed-off-by: PanNengyuan 
> ---
> Changes v2 to v1:
> - remove unused var 'arg' (suggested by Alberto Garcia)
> ---
>  block/throttle-groups.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Thanks, applied to my block-next branch for 5.0:

https://git.xanclic.moe/XanClic/qemu/commits/branch/block-next

Max



signature.asc
Description: OpenPGP digital signature

Re: PCI memory sync question (kvm,dpdk,e1000,packet stalled)

2019-11-27 Thread ASM

Stefan, thanks for answering.

When the packet is received, e1000 writes it to memory directrly
without any RCU.
The address of memory for writing is set by the driver from dpdk driver.
Driver writes to RDBA (RDBAH,RDBAL) base address of ring.

It turns out that MMIO RCU (mentioned from e1000_mmio_setup) does not
protect, and can't protect the ring descriptors.
The area for protection may be any area of operational memory. And it
becomes famous when writing to registers RDBA by driver.
(see datasheet 82574 GbE Controller "7.1.8 Receive Descriptor Queue Structure")

How can this memory be protected? As I understand it, the e1000 should
track the record in RDBA and enable memory protection in this region.
But how to do it right?

Source code qemu:
hw/net/e1000.c:954 (version master)

 954 base = rx_desc_base(s) + sizeof(desc) * s->mac_reg[RDH];
where rx_desc_base -- address RDBAH regs. It address no have RCU protect.
...
955 pci_dma_read(d, base, &desc, sizeof(desc));
...
957 desc.status |= (vlan_status | E1000_RXD_STAT_DD);
...
990 pci_dma_write(d, base, &desc, sizeof(desc));
->
exec.c:
3111 static MemTxResult flatview_write_continue(FlatView *fv, hwaddr addr,
3112MemTxAttrs attrs,
3113const uint8_t *buf,
3114hwaddr len, hwaddr addr1,
3115hwaddr l, MemoryRegion *mr)
3116 {
...
3123 if (!memory_access_is_direct(mr, true)) {
(false)
3131 } else {
3132 /* RAM case */
3133 ptr = qemu_ram_ptr_length(mr->ram_block, addr1, &l, false);
3134 memcpy(ptr, buf, l);

where I be seeing weird behavior with KVM due to MMIO write coalescing

3135 invalidate_and_set_dirty(mr, addr1, l);
3136 }
3137

Source code dpdk(e1000): (version dpdk-stable-17.11.9)
drivers/net/e1000/em_rxtx.c:

699 uint16_t
700 eth_em_recv_pkts(void *rx_queue, struct rte_mbuf **rx_pkts,
701 uint16_t nb_pkts)
...
718 rxq = rx_queue
...
722 rx_id = rxq->rx_tail;
723 rx_ring = rxq->rx_ring
...
734 rxdp = &rx_ring[rx_id];
735 status = rxdp->status;
736 if (! (status & E1000_RXD_STAT_DD))
737 break;
...
807 rxdp->buffer_addr = dma_addr;
808 rxdp->status = 0;
where I be seeing weird behavior with KVM due to MMIO write
coalescing


P.S.
> Also, is DPDK accessing the e1000 device from more than 1 vCPU?
 All tests on single virtual CPU.

I created github project for quick reproduction of this error:
https://github.com/BASM/qemu_dpdk_e1000_test

---
Best regards,
Leonid Myravjev

On Thu, 21 Nov 2019 at 17:05, Stefan Hajnoczi  wrote:
>
> On Wed, Nov 20, 2019 at 08:36:32PM +0300, ASM wrote:
> > I trying solve the problem, with packets stopping (e1000,tap,kvm).
> > My studies led to the following:
> > 1. From flatview_write_continue() I see, what e1000 writes the number
> > "7" to the STAT register.
> > 2. The driver from target OS reads STAT register with number "7" and
> > writes to the register the number "0".
> > 3. From flatview_write_continue() (I make edits):
> > memcpy(ptr, buf, l);
> > new1=ptr[0xc];
> > usleep(100);
> > new2=ptr[0xc];
> > invalidate_and_set_dirty(mr, addr1, l);
> > new3=ptr[0xc];
> > printf("Old: %i, new1, %i, new2: %i, new3: %i\n", old,new1,new2,new3);
> >
> > I see what memory in first printf is "7", but after usleep() is "0".
> > Do I understand correctly that this should not be? Or RCU lock
> > suggests the ability to the multiple writers?
> >
> > The problem is that qemu(e1000) writes the number 7, after which
> > target(dpdk driver) reads 7, on the basis of this it writes the number
> > 0, but as a result (extremely rarely), the value STATUS still remains
> > 7. Therefore, packet processing is interrupted. This behavior is
> > observed only on kvm (it is not observed on tcg).
> >
> > Please help with advice or ideas.
>
> Hi Leonid,
> Could you be seeing weird behavior with KVM due to MMIO write
> coalescing?
>
>   static void e1000_mmio_setup(E1000State *d)
>   {
>   int i;
>   const uint32_t excluded_regs[] = {
>   E1000_MDIC, E1000_ICR, E1000_ICS, E1000_IMS,
>   E1000_IMC, E1000_TCTL, E1000_TDT, PNPMMIO_SIZE
>   };
>
>   memory_region_init_io(&d->mmio, OBJECT(d), &e1000_mmio_ops, d,
> "e1000-mmio", PNPMMIO_SIZE);
>   memory_region_add_coalescing(&d->mmio, 0, excluded_regs[0]);
>   for (i = 0; excluded_regs[i] != PNPMMIO_SIZE; i++)
>   memory_region_add_coalescing(&d->mmio, excluded_regs[i] + 4,
>excluded_regs[i+1] - excluded_regs[i] 
> - 4);
>   memory_region_init_io(&d->io, OBJECT(d), &e1000_io_ops, d, "e1000-io", 
> IOPORT_SI

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Xiang Zheng

Hi Beata,

Thanks for you review!

On 2019/11/22 23:47, Beata Michalska wrote:
> Hi,
> 
> On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
>>
>> From: Dongjiu Geng 
>>
>> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
>> translates the host VA delivered by host to guest PA, then fills this PA
>> to guest APEI GHES memory, then notifies guest according to the SIGBUS
>> type.
>>
>> When guest accesses the poisoned memory, it will generate a Synchronous
>> External Abort(SEA). Then host kernel gets an APEI notification and calls
>> memory_failure() to unmapped the affected page in stage 2, finally
>> returns to guest.
>>
>> Guest continues to access the PG_hwpoison page, it will trap to KVM as
>> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
>> Qemu, Qemu records this error address into guest APEI GHES memory and
>> notifes guest using Synchronous-External-Abort(SEA).
>>
>> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
>> in which we can setup the type of exception and the syndrome information.
>> When switching to guest, the target vcpu will jump to the synchronous
>> external abort vector table entry.
>>
>> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
>> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
>> not valid and hold an UNKNOWN value. These values will be set to KVM
>> register structures through KVM_SET_ONE_REG IOCTL.
>>
>> Signed-off-by: Dongjiu Geng 
>> Signed-off-by: Xiang Zheng 
>> Reviewed-by: Michael S. Tsirkin 
>> ---
>>  hw/acpi/acpi_ghes.c | 297 
>>  include/hw/acpi/acpi_ghes.h |   4 +
>>  include/sysemu/kvm.h|   3 +-
>>  target/arm/cpu.h|   4 +
>>  target/arm/helper.c |   2 +-
>>  target/arm/internals.h  |   5 +-
>>  target/arm/kvm64.c  |  64 
>>  target/arm/tlb_helper.c |   2 +-
>>  target/i386/cpu.h   |   2 +
>>  9 files changed, 377 insertions(+), 6 deletions(-)
>>
>> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
>> index 42c00ff3d3..f5b54990c0 100644
>> --- a/hw/acpi/acpi_ghes.c
>> +++ b/hw/acpi/acpi_ghes.c
>> @@ -39,6 +39,34 @@
>>  /* The max size in bytes for one error block */
>>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
>>
>> +/*
>> + * The total size of Generic Error Data Entry
>> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-343 Generic Error Data Entry
>> + */
>> +#define ACPI_GHES_DATA_LENGTH   72
>> +
>> +/*
>> + * The memory section CPER size,
>> + * UEFI 2.6: N.2.5 Memory Error Section
>> + */
>> +#define ACPI_GHES_MEM_CPER_LENGTH   80
>> +
>> +/*
>> + * Masks for block_status flags
>> + */
>> +#define ACPI_GEBS_UNCORRECTABLE 1
> 
> Why not listing all supported statuses ? Similar to error severity below ?
> 

We now only use the first bit for uncorrectable error. The correctable errors
are handled in host and would not be delivered to QEMU.

I think it's unnecessary to list all the bit masks.

>> +
>> +/*
>> + * Values for error_severity field
>> + */
>> +enum AcpiGenericErrorSeverity {
>> +ACPI_CPER_SEV_RECOVERABLE,
>> +ACPI_CPER_SEV_FATAL,
>> +ACPI_CPER_SEV_CORRECTED,
>> +ACPI_CPER_SEV_NONE,
>> +};
>> +
>>  /*
>>   * Now only support ARMv8 SEA notification type error source
>>   */
>> @@ -49,6 +77,16 @@
>>   */
>>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
>>
>> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
>> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
>> 0xff, \
>> +((b) >> 8) & 0xff, (b) & 0xff,   \
>> +((c) >> 8) & 0xff, (c) & 0xff,\
>> +(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
>> +
>> +#define UEFI_CPER_SEC_PLATFORM_MEM   \
>> +UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
>> +0xED, 0x7C, 0x83, 0xB1)
>> +
>>  /*
>>   * | +--+ 0
>>   * | |Header|
>> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
>>  uint64_t ghes_addr_le;
>>  } AcpiGhesState;
>>
>> +/*
>> + * Total size for Generic Error Status Block
>> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
>> + * Table 18-380 Generic Error Status Block
>> + */
>> +#define ACPI_GHES_GESB_SIZE 20
> 
> Minor: This is not entirely correct: GEDE is part of GESB so the total length
> would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)
> 

Yes, here it only indicates the total length of Generic Error Status Block 
structure
expect "GEDEs".

>> +/* The offset of Data Length in Generic Error Status Block */
>> +#define ACPI_GHES_GESB_DATA_LENGTH_OFFSET   12
>> +
> 
> If those were nicely represented as structures you get the offsets easily
> without having number of defines. That could simplify the code and make it
> more readable - see comments below
> 

To address Igor's comment, this macro is useless and I

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Igor Mammedov

On Wed, 27 Nov 2019 20:47:15 +0800
Xiang Zheng  wrote:

> Hi Beata,
> 
> Thanks for you review!
> 
> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> > 
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:  
> >>
> >> From: Dongjiu Geng 
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng 
> >> Signed-off-by: Xiang Zheng 
> >> Reviewed-by: Michael S. Tsirkin 
> >> ---
[...]
> >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> >> index cb62ec9c7b..8e3c5b879e 100644
> >> --- a/include/hw/acpi/acpi_ghes.h
> >> +++ b/include/hw/acpi/acpi_ghes.h
> >> @@ -24,6 +24,9 @@
> >>
> >>  #include "hw/acpi/bios-linker-loader.h"
> >>
> >> +#define ACPI_GHES_CPER_OK   1
> >> +#define ACPI_GHES_CPER_FAIL 0
> >> +  
> > 
> > Is there really a need to introduce those ?
> >   
> 
> Don't you think it's more clear than using "1" or "0"? :)

or maybe just reuse default libc return convention: 0 - ok, -1 - fail
and drop custom macros

> 
> >>  /*
> >>   * Values for Hardware Error Notification Type field
> >>   */
[...]

Re: [PATCH] block: always fill entire LUKS header space with zeros

2019-11-27 Thread Max Reitz

On 25.11.19 18:43, Daniel P. Berrangé wrote:
> When initializing the LUKS header the size with default encryption
> parameters will currently be 2068480 bytes. This is rounded up to
> a multiple of the cluster size, 2081792, with 64k sectors. If the
> end of the header is not the same as the end of the cluster we fill
> the extra space with zeros. This was forgetting that not even the
> space allocated for the header will be fully initialized, as we
> only write key material for the first key slot. The space left
> for the other 7 slots is never written to.
> 
> An optimization to the ref count checking code:
> 
>   commit a5fff8d4b4d928311a5005efa12d0991fe3b66f9 (refs/bisect/bad)
>   Author: Vladimir Sementsov-Ogievskiy 
>   Date:   Wed Feb 27 16:14:30 2019 +0300
> 
> qcow2-refcount: avoid eating RAM
> 
> made the assumption that every cluster which was allocated would
> have at least some data written to it. This was violated by way
> the LUKS header is only partially written, with much space simply
> reserved for future use.
> 
> Depending on the cluster size this problem was masked by the
> logic which wrote zeros between the end of the LUKS header and
> the end of the cluster.
> 
> $ qemu-img create --object secret,id=cluster_encrypt0,data=123456 \
>-f qcow2 -o cluster_size=2k,encrypt.iter-time=1,\
>encrypt.format=luks,encrypt.key-secret=cluster_encrypt0 \
>  cluster_size_check.qcow2 100M
>   Formatting 'cluster_size_check.qcow2', fmt=qcow2 size=104857600
> encrypt.format=luks encrypt.key-secret=cluster_encrypt0
> encrypt.iter-time=1 cluster_size=2048 lazy_refcounts=off refcount_bits=16
> 
> $ qemu-img check --object secret,id=cluster_encrypt0,data=redhat \
> 'json:{"driver": "qcow2", "encrypt.format": "luks", \
>"encrypt.key-secret": "cluster_encrypt0", \
>  "file.driver": "file", "file.filename": "cluster_size_check.qcow2"}'
> ERROR: counting reference for region exceeding the end of the file by one 
> cluster or more: offset 0x2000 size 0x1f9000
> Leaked cluster 4 refcount=1 reference=0
> ...snip...
> Leaked cluster 130 refcount=1 reference=0
> 
> 1 errors were found on the image.
> Data may be corrupted, or further writes to the image may corrupt it.
> 
> 127 leaked clusters were found on the image.
> This means waste of disk space, but no harm to data.
> Image end offset: 268288
> 
> The problem only exists when the disk image is entirely empty. Writing
> data to the disk image payload will solve the problem by causing the
> end of the file to be extended further.
> 
> The change fixes it by ensuring that the entire allocated LUKS header
> region is fully initialized with zeros. The qemu-img check will still
> fail for any pre-existing disk images created prior to this change,
> unless at least 1 byte of the payload is written to.
> 
> Fully writing zeros to the entire LUKS header is a good idea regardless
> as it ensures that space has been allocated on the host filesystem (or
> whatever block storage backend is used).
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  block/qcow2.c  |   4 +-
>  tests/qemu-iotests/278 |  88 +
>  tests/qemu-iotests/278.out | 197 +
>  tests/qemu-iotests/group   |   1 +
>  4 files changed, 288 insertions(+), 2 deletions(-)
>  create mode 100755 tests/qemu-iotests/278
>  create mode 100644 tests/qemu-iotests/278.out
> 
> diff --git a/block/qcow2.c b/block/qcow2.c
> index 7c18721741..dcfdd200fc 100644
> --- a/block/qcow2.c
> +++ b/block/qcow2.c
> @@ -140,8 +140,8 @@ static ssize_t qcow2_crypto_hdr_init_func(QCryptoBlock 
> *block, size_t headerlen,

There’s a comment right here that reads:

> /* Zero fill remaining space in cluster so it has predictable
>  * content in case of future spec changes */

I think that should be adjusted to reflect the change.

>  clusterlen = size_to_clusters(s, headerlen) * s->cluster_size;
>  assert(qcow2_pre_write_overlap_check(bs, 0, ret, clusterlen, false) == 
> 0);
>  ret = bdrv_pwrite_zeroes(bs->file,
> - ret + headerlen,
> - clusterlen - headerlen, 0);
> + ret,
> + clusterlen, 0);
>  if (ret < 0) {
>  error_setg_errno(errp, -ret, "Could not zero fill encryption 
> header");
>  return -1;
> diff --git a/tests/qemu-iotests/278 b/tests/qemu-iotests/278
> new file mode 100755
> index 00..c52f03dd52
> --- /dev/null
> +++ b/tests/qemu-iotests/278
> @@ -0,0 +1,88 @@

[...]

> +IMGSPEC="driver=$IMGFMT,file.filename=$TEST_IMG,encrypt.key-secret=sec0"
> +QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT
> +
> +_run_test()
> +{
> +echo

The indentation is off here.

> + echo "== cluster size $csize"

Or maybe it really is off here, because starting from here everything is
indented with tabs.

> + echo "== checking image refcounts =="
> + $QEMU

[PATCH for-5.0 04/31] block: Pass BdrvChildRole to bdrv_child_perm()

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c | 22 --
 block/backup-top.c  |  3 ++-
 block/blkdebug.c|  5 +++--
 block/blklogwrites.c|  9 +
 block/commit.c  |  1 +
 block/copy-on-read.c|  1 +
 block/mirror.c  |  1 +
 block/quorum.c  |  1 +
 block/replication.c |  1 +
 block/vvfat.c   |  1 +
 include/block/block_int.h   |  5 -
 tests/test-bdrv-drain.c |  5 +++--
 tests/test-bdrv-graph-mod.c |  1 +
 13 files changed, 36 insertions(+), 20 deletions(-)

diff --git a/block.c b/block.c
index fc3994820f..90974ae36b 100644
--- a/block.c
+++ b/block.c
@@ -1764,12 +1764,12 @@ bool bdrv_is_writable(BlockDriverState *bs)
 
 static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
 BdrvChild *c, const BdrvChildClass *child_class,
-BlockReopenQueue *reopen_queue,
+BdrvChildRole role, BlockReopenQueue *reopen_queue,
 uint64_t parent_perm, uint64_t parent_shared,
 uint64_t *nperm, uint64_t *nshared)
 {
 assert(bs->drv && bs->drv->bdrv_child_perm);
-bs->drv->bdrv_child_perm(bs, c, child_class, reopen_queue,
+bs->drv->bdrv_child_perm(bs, c, child_class, role, reopen_queue,
  parent_perm, parent_shared,
  nperm, nshared);
 /* TODO Take force_share from reopen_queue */
@@ -1863,7 +1863,7 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
 uint64_t cur_perm, cur_shared;
 bool child_tighten_restr;
 
-bdrv_child_perm(bs, c->bs, c, c->klass, q,
+bdrv_child_perm(bs, c->bs, c, c->klass, c->role, q,
 cumulative_perms, cumulative_shared_perms,
 &cur_perm, &cur_shared);
 ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, 
ignore_children,
@@ -1930,7 +1930,7 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 /* Update all children */
 QLIST_FOREACH(c, &bs->children, next) {
 uint64_t cur_perm, cur_shared;
-bdrv_child_perm(bs, c->bs, c, c->klass, NULL,
+bdrv_child_perm(bs, c->bs, c, c->klass, c->role, NULL,
 cumulative_perms, cumulative_shared_perms,
 &cur_perm, &cur_shared);
 bdrv_child_set_perm(c, cur_perm, cur_shared);
@@ -2157,14 +2157,15 @@ int bdrv_child_refresh_perms(BlockDriverState *bs, 
BdrvChild *c, Error **errp)
 uint64_t perms, shared;
 
 bdrv_get_cumulative_perm(bs, &parent_perms, &parent_shared);
-bdrv_child_perm(bs, c->bs, c, c->klass, NULL, parent_perms, parent_shared,
-&perms, &shared);
+bdrv_child_perm(bs, c->bs, c, c->klass, c->role, NULL,
+parent_perms, parent_shared, &perms, &shared);
 
 return bdrv_child_try_set_perm(c, perms, shared, errp);
 }
 
 void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildClass *child_class,
+   BdrvChildRole role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
@@ -2175,6 +2176,7 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildClass *child_class,
+   BdrvChildRole role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
@@ -2187,7 +2189,7 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 
 /* Apart from the modifications below, the same permissions are
  * forwarded and left alone as for filters */
-bdrv_filter_default_perms(bs, c, child_class, reopen_queue,
+bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
   perm, shared, &perm, &shared);
 
 /* Format drivers may touch metadata even if the guest doesn't write */
@@ -2463,7 +2465,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 bdrv_get_cumulative_perm(parent_bs, &perm, &shared_perm);
 
 assert(parent_bs->drv);
-bdrv_child_perm(parent_bs, child_bs, NULL, child_class, NULL,
+bdrv_child_perm(parent_bs, child_bs, NULL, child_class, child_role, NULL,
 perm, shared_perm, &perm, &shared_perm);
 
 child = bdrv_root_attach_child(child_bs, child_name, child_class,
@@ -3501,7 +3503,7 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 if (state->replace_backing

[PATCH for-5.0 00/31] block: Introduce real BdrvChildRole

2019-11-27 Thread Max Reitz

Based-on: <2019160216.197086-1-mre...@redhat.com>
(“block: Fix check_to_replace_node()”)

This is preliminary work for v7 of “Deal with filters”.  As Kevin has
noted, there may be e.g. multiple storage children, and there should
probably be some way for drivers to signal what they use each child for.

Before this series, this is done in a way with the child_* BdrvChildRole
objects (i.e., child_file, child_format, child_backing).  However, they
don’t really suit that task, for multiple reasons:
(1) They don’t formally mean anything.
(2) Drivers may or may not use them.  We have tests that just copy
child_backing and overwrite a single callback to suit their need.
(3) You can’t combine them (e.g. for children that store both data and
metadata).

The current BdrvChildRole structure is really just a way to contact the
parent about any changes regarding the child, so it doesn’t describe a
role but a class.  Hence this series renames it to BdrvChildClass.

Then we can introduce a real BdrvChildRole, which is an enum that
captures the roles a child can have in a bit field, so they can be
combined.

It turns out that we can use this role to unify child_file,
child_format, and child_backing into a generic child_of_bds class that
can decide what to do for each child (e.g. when it comes to flag
inheritance) based on the BdrvChildRole.

This also applies to bdrv_format_default_perms() and
bdrv_filter_default_perms(): We can unify them in a generic
bdrv_filter_perms() that has different paths for filtered, backing,
metadata, and pure-data children.


Max Reitz (31):
  block: Rename BdrvChildRole to BdrvChildClass
  block: Add BdrvChildRole
  block: Add BdrvChildRole to BdrvChild
  block: Pass BdrvChildRole to bdrv_child_perm()
  block: Drop BdrvChildClass.stay_at_node
  block: Keep BDRV_O_NO_IO in *inherited_fmt_options
  block: Pass BdrvChildRole to .inherit_options()
  block: Unify bdrv_*inherited_options()
  block: Unify bdrv_child_cb_attach()
  block: Unify bdrv_child_cb_detach()
  block: Add child_of_bds
  block: Distinguish paths in *_format_default_perms
  block: Pull out bdrv_default_perms_for_backing()
  block: Pull out bdrv_default_perms_for_storage()
  block: Split bdrv_default_perms_for_storage()
  block: Add bdrv_default_perms()
  raw-format: Split raw_read_options()
  block: Switch child_format users to child_of_bds
  block: Drop child_format
  block: Make backing files child_of_bds children
  block: Drop child_backing
  block: Make format drivers use child_of_bds
  block: Make filter drivers use child_of_bds
  block: Use child_of_bds in remaining places
  tests: Use child_of_bds instead of child_file
  block: Use bdrv_default_perms()
  block: Make bdrv_filter_default_perms() static
  block: Drop bdrv_format_default_perms()
  block: Drop child_file
  block: Pass BdrvChildRole in remaining cases
  block: Drop @child_class from bdrv_child_perm()

 block.c | 477 +---
 block/backup-top.c  |  11 +-
 block/blkdebug.c|  10 +-
 block/blklogwrites.c|  17 +-
 block/blkreplay.c   |   7 +-
 block/blkverify.c   |  12 +-
 block/block-backend.c   |  20 +-
 block/bochs.c   |   6 +-
 block/cloop.c   |   6 +-
 block/commit.c  |   2 +-
 block/copy-on-read.c|   7 +-
 block/crypto.c  |   6 +-
 block/dmg.c |   6 +-
 block/io.c  |  22 +-
 block/mirror.c  |   2 +-
 block/parallels.c   |   6 +-
 block/qcow.c|   6 +-
 block/qcow2.c   |  20 +-
 block/qed.c |   6 +-
 block/quorum.c  |  11 +-
 block/raw-format.c  | 128 ++
 block/replication.c |   5 +-
 block/throttle.c|   7 +-
 block/vdi.c |   6 +-
 block/vhdx.c|   6 +-
 block/vmdk.c|  22 +-
 block/vpc.c |   6 +-
 block/vvfat.c   |  11 +-
 blockjob.c  |   8 +-
 include/block/block.h   |  46 +++-
 include/block/block_int.h   |  54 ++--
 tests/test-bdrv-drain.c |  72 +++---
 tests/test-bdrv-graph-mod.c |  10 +-
 tests/test-block-iothread.c |  17 +-
 34 files changed, 625 insertions(+), 433 deletions(-)

-- 
2.23.0

[PATCH for-5.0 11/31] block: Add child_of_bds

2019-11-27 Thread Max Reitz

Any current user of child_file, child_format, and child_backing can and
should use this generic BdrvChildClass instead, as it can handle all of
these cases.  However, to be able to do so, the users must pass the
appropriate BdrvChildRole when the child is created/attached.  (The
following commits will take care of that.)

Signed-off-by: Max Reitz 
---
 block.c   | 27 +++
 include/block/block_int.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/block.c b/block.c
index 89214efa36..8542768d35 100644
--- a/block.c
+++ b/block.c
@@ -1050,6 +1050,33 @@ static void bdrv_inherited_options(BdrvChildRole role,
 *child_flags = flags;
 }
 
+static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
+const char *filename, Error **errp);
+
+static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
+ const char *filename, Error **errp)
+{
+if (c->role & BDRV_CHILD_COW) {
+return bdrv_backing_update_filename(c, base, filename, errp);
+}
+return 0;
+}
+
+const BdrvChildClass child_of_bds = {
+.parent_is_bds   = true,
+.get_parent_desc = bdrv_child_get_parent_desc,
+.inherit_options = bdrv_inherited_options,
+.drained_begin   = bdrv_child_cb_drained_begin,
+.drained_poll= bdrv_child_cb_drained_poll,
+.drained_end = bdrv_child_cb_drained_end,
+.attach  = bdrv_child_cb_attach,
+.detach  = bdrv_child_cb_detach,
+.inactivate  = bdrv_child_cb_inactivate,
+.can_set_aio_ctx = bdrv_child_cb_can_set_aio_ctx,
+.set_aio_ctx = bdrv_child_cb_set_aio_ctx,
+.update_filename = bdrv_child_cb_update_filename,
+};
+
 /*
  * Returns the options and flags that bs->file should get if a protocol driver
  * is expected, based on the given options and flags for the parent BDS
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7553faa5cf..f2f8d770c6 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -729,6 +729,7 @@ struct BdrvChildClass {
 void (*set_aio_ctx)(BdrvChild *child, AioContext *ctx, GSList **ignore);
 };
 
+extern const BdrvChildClass child_of_bds;
 extern const BdrvChildClass child_file;
 extern const BdrvChildClass child_format;
 extern const BdrvChildClass child_backing;
-- 
2.23.0

[PATCH for-5.0 02/31] block: Add BdrvChildRole

2019-11-27 Thread Max Reitz

This enum will supplement BdrvChildClass when it comes to what role (or
combination of roles) a child takes for its parent.

Because empty enums are not allowed, let us just start with it filled.

Signed-off-by: Max Reitz 
---
 include/block/block.h | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/include/block/block.h b/include/block/block.h
index 38963ef203..36817d5689 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -279,6 +279,44 @@ enum {
 DEFAULT_PERM_UNCHANGED  = BLK_PERM_ALL & ~DEFAULT_PERM_PASSTHROUGH,
 };
 
+typedef enum BdrvChildRole {
+/*
+ * If present, bdrv_replace_node() will not change the node this
+ * BdrvChild points to.
+ */
+BDRV_CHILD_STAY_AT_NODE = (1 << 0),
+
+/* Child stores data */
+BDRV_CHILD_DATA = (1 << 1),
+
+/* Child stores metadata */
+BDRV_CHILD_METADATA = (1 << 2),
+
+/* Filtered child */
+BDRV_CHILD_FILTERED = (1 << 3),
+
+/* Child to COW from (backing child) */
+BDRV_CHILD_COW  = (1 << 4),
+
+/* Child is expected to be a protocol node */
+BDRV_CHILD_PROTOCOL = (1 << 5),
+
+/* Child is expected to be a format node */
+BDRV_CHILD_FORMAT   = (1 << 6),
+
+/*
+ * The primary child.  For most drivers, this is the child whose
+ * filename applies best to the parent node.
+ */
+BDRV_CHILD_PRIMARY  = (1 << 7),
+
+/* Useful combination of flags */
+BDRV_CHILD_IMAGE= BDRV_CHILD_DATA
+  | BDRV_CHILD_METADATA
+  | BDRV_CHILD_PROTOCOL
+  | BDRV_CHILD_PRIMARY,
+} BdrvChildRole;
+
 char *bdrv_perm_names(uint64_t perm);
 uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm);
 
-- 
2.23.0

[PATCH for-5.0 03/31] block: Add BdrvChildRole to BdrvChild

2019-11-27 Thread Max Reitz

For now, it is always set to 0.  Later patches in this series will
ensure that all callers pass an appropriate combination of flags.

Signed-off-by: Max Reitz 
---
 block.c | 11 ---
 block/backup-top.c  |  3 ++-
 block/blkdebug.c|  2 +-
 block/blklogwrites.c|  6 +++---
 block/blkreplay.c   |  2 +-
 block/blkverify.c   |  4 ++--
 block/block-backend.c   |  4 ++--
 block/bochs.c   |  2 +-
 block/cloop.c   |  2 +-
 block/copy-on-read.c|  2 +-
 block/crypto.c  |  2 +-
 block/dmg.c |  2 +-
 block/parallels.c   |  2 +-
 block/qcow.c|  2 +-
 block/qcow2.c   |  6 +++---
 block/qed.c |  2 +-
 block/quorum.c  |  4 ++--
 block/raw-format.c  |  2 +-
 block/replication.c |  2 +-
 block/throttle.c|  2 +-
 block/vdi.c |  2 +-
 block/vhdx.c|  2 +-
 block/vmdk.c|  4 ++--
 block/vpc.c |  2 +-
 block/vvfat.c   |  2 +-
 blockjob.c  |  5 +++--
 include/block/block.h   |  2 ++
 include/block/block_int.h   |  2 ++
 tests/test-bdrv-drain.c | 20 +++-
 tests/test-bdrv-graph-mod.c |  4 ++--
 30 files changed, 61 insertions(+), 48 deletions(-)

diff --git a/block.c b/block.c
index 7c16cf2fe6..fc3994820f 100644
--- a/block.c
+++ b/block.c
@@ -2380,6 +2380,7 @@ static void bdrv_replace_child(BdrvChild *child, 
BlockDriverState *new_bs)
 BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
   const char *child_name,
   const BdrvChildClass *child_class,
+  BdrvChildRole child_role,
   AioContext *ctx,
   uint64_t perm, uint64_t shared_perm,
   void *opaque, Error **errp)
@@ -2401,6 +2402,7 @@ BdrvChild *bdrv_root_attach_child(BlockDriverState 
*child_bs,
 .bs = NULL,
 .name   = g_strdup(child_name),
 .klass  = child_class,
+.role   = child_role,
 .perm   = perm,
 .shared_perm= shared_perm,
 .opaque = opaque,
@@ -2452,6 +2454,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
  BlockDriverState *child_bs,
  const char *child_name,
  const BdrvChildClass *child_class,
+ BdrvChildRole child_role,
  Error **errp)
 {
 BdrvChild *child;
@@ -2464,7 +2467,7 @@ BdrvChild *bdrv_attach_child(BlockDriverState *parent_bs,
 perm, shared_perm, &perm, &shared_perm);
 
 child = bdrv_root_attach_child(child_bs, child_name, child_class,
-   bdrv_get_aio_context(parent_bs),
+   child_role, bdrv_get_aio_context(parent_bs),
perm, shared_perm, parent_bs, errp);
 if (child == NULL) {
 return NULL;
@@ -2585,7 +2588,7 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 }
 
 bs->backing = bdrv_attach_child(bs, backing_hd, "backing", &child_backing,
-errp);
+0, errp);
 /* If backing_hd was already part of bs's backing chain, and
  * inherits_from pointed recursively to bs then let's update it to
  * point directly to bs (else it will become NULL). */
@@ -2776,6 +2779,7 @@ BdrvChild *bdrv_open_child(const char *filename,
QDict *options, const char *bdref_key,
BlockDriverState *parent,
const BdrvChildClass *child_class,
+   BdrvChildRole child_role,
bool allow_none, Error **errp)
 {
 BlockDriverState *bs;
@@ -2786,7 +2790,8 @@ BdrvChild *bdrv_open_child(const char *filename,
 return NULL;
 }
 
-return bdrv_attach_child(parent, bs, bdref_key, child_class, errp);
+return bdrv_attach_child(parent, bs, bdref_key, child_class, child_role,
+ errp);
 }
 
 /*
diff --git a/block/backup-top.c b/block/backup-top.c
index e2ded7f570..273d41b752 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -199,7 +199,8 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
 top->opaque = state = g_new0(BDRVBackupTopState, 1);
 
 bdrv_ref(target);
-state->target = bdrv_attach_child(top, target, "target", &child_file, 
errp);
+state->target = bdrv_attach_child(top, target, "target", &child_file, 0,
+  errp);
 if (!state->target) {
 bdrv_unref(target);
 bdrv_unre

[PATCH for-5.0 01/31] block: Rename BdrvChildRole to BdrvChildClass

2019-11-27 Thread Max Reitz

This structure nearly only contains parent callbacks for child state
changes.  It cannot really reflect a child's role, because different
roles may overlap (as we will see when real roles are introduced), and
because parents can have custom callbacks even when the child fulfills a
standard role.

Signed-off-by: Max Reitz 
---
 block.c | 142 ++--
 block/backup-top.c  |   8 +-
 block/blkdebug.c|   4 +-
 block/blklogwrites.c|   8 +-
 block/block-backend.c   |   6 +-
 block/commit.c  |   2 +-
 block/copy-on-read.c|   2 +-
 block/io.c  |  22 +++---
 block/mirror.c  |   2 +-
 block/quorum.c  |   2 +-
 block/replication.c |   2 +-
 block/vvfat.c   |   6 +-
 blockjob.c  |   2 +-
 include/block/block.h   |   6 +-
 include/block/block_int.h   |  22 +++---
 tests/test-bdrv-drain.c |  36 -
 tests/test-bdrv-graph-mod.c |   2 +-
 17 files changed, 141 insertions(+), 133 deletions(-)

diff --git a/block.c b/block.c
index 36015f1b7b..7c16cf2fe6 100644
--- a/block.c
+++ b/block.c
@@ -76,7 +76,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
const char *reference,
QDict *options, int flags,
BlockDriverState *parent,
-   const BdrvChildRole *child_role,
+   const BdrvChildClass *child_class,
Error **errp);
 
 /* If non-zero, use only whitelisted block drivers */
@@ -1009,7 +1009,7 @@ static void bdrv_inherited_options(int *child_flags, 
QDict *child_options,
 *child_flags = flags;
 }
 
-const BdrvChildRole child_file = {
+const BdrvChildClass child_file = {
 .parent_is_bds   = true,
 .get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_inherited_options,
@@ -1037,7 +1037,7 @@ static void bdrv_inherited_fmt_options(int *child_flags, 
QDict *child_options,
 *child_flags &= ~(BDRV_O_PROTOCOL | BDRV_O_NO_IO);
 }
 
-const BdrvChildRole child_format = {
+const BdrvChildClass child_format = {
 .parent_is_bds   = true,
 .get_parent_desc = bdrv_child_get_parent_desc,
 .inherit_options = bdrv_inherited_fmt_options,
@@ -1161,7 +1161,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 return ret;
 }
 
-const BdrvChildRole child_backing = {
+const BdrvChildClass child_backing = {
 .parent_is_bds   = true,
 .get_parent_desc = bdrv_child_get_parent_desc,
 .attach  = bdrv_backing_attach,
@@ -1763,13 +1763,13 @@ bool bdrv_is_writable(BlockDriverState *bs)
 }
 
 static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
-BdrvChild *c, const BdrvChildRole *role,
+BdrvChild *c, const BdrvChildClass *child_class,
 BlockReopenQueue *reopen_queue,
 uint64_t parent_perm, uint64_t parent_shared,
 uint64_t *nperm, uint64_t *nshared)
 {
 assert(bs->drv && bs->drv->bdrv_child_perm);
-bs->drv->bdrv_child_perm(bs, c, role, reopen_queue,
+bs->drv->bdrv_child_perm(bs, c, child_class, reopen_queue,
  parent_perm, parent_shared,
  nperm, nshared);
 /* TODO Take force_share from reopen_queue */
@@ -1863,7 +1863,7 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
 uint64_t cur_perm, cur_shared;
 bool child_tighten_restr;
 
-bdrv_child_perm(bs, c->bs, c, c->role, q,
+bdrv_child_perm(bs, c->bs, c, c->klass, q,
 cumulative_perms, cumulative_shared_perms,
 &cur_perm, &cur_shared);
 ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, 
ignore_children,
@@ -1930,7 +1930,7 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 /* Update all children */
 QLIST_FOREACH(c, &bs->children, next) {
 uint64_t cur_perm, cur_shared;
-bdrv_child_perm(bs, c->bs, c, c->role, NULL,
+bdrv_child_perm(bs, c->bs, c, c->klass, NULL,
 cumulative_perms, cumulative_shared_perms,
 &cur_perm, &cur_shared);
 bdrv_child_set_perm(c, cur_perm, cur_shared);
@@ -1955,8 +1955,8 @@ static void bdrv_get_cumulative_perm(BlockDriverState 
*bs, uint64_t *perm,
 
 static char *bdrv_child_user_desc(BdrvChild *c)
 {
-if (c->role->get_parent_desc) {
-return c->role->get_parent_desc(c);
+if (c->klass->get_parent_desc) {
+return c->klass->get_parent_desc(c);
 }
 
 return g_strdup("another user");
@@ -2157,14 +2157,14 @@ int bdrv_child_refresh_perms(BlockD

[PATCH for-5.0 13/31] block: Pull out bdrv_default_perms_for_backing()

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c | 62 +
 1 file changed, 40 insertions(+), 22 deletions(-)

diff --git a/block.c b/block.c
index eb282f0977..2771bc45ce 100644
--- a/block.c
+++ b/block.c
@@ -2256,6 +2256,44 @@ void bdrv_filter_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
 }
 
+static void bdrv_default_perms_for_backing(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildClass *child_class,
+   BdrvChildRole role,
+   BlockReopenQueue *reopen_queue,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+assert(child_class == &child_backing ||
+   (child_class == &child_of_bds && (role & BDRV_CHILD_COW)));
+
+/*
+ * We want consistent read from backing files if the parent needs it.
+ * No other operations are performed on backing files.
+ */
+perm &= BLK_PERM_CONSISTENT_READ;
+
+/*
+ * If the parent can deal with changing data, we're okay with a
+ * writable and resizable backing file.
+ * TODO Require !(perm & BLK_PERM_CONSISTENT_READ), too?
+ */
+if (shared & BLK_PERM_WRITE) {
+shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;
+} else {
+shared = 0;
+}
+
+shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_GRAPH_MOD |
+  BLK_PERM_WRITE_UNCHANGED;
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
+
+*nperm = perm;
+*nshared = shared;
+}
+
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildClass *child_class,
BdrvChildRole role,
@@ -2293,28 +2331,8 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 *nperm = perm;
 *nshared = shared;
 } else {
-/* We want consistent read from backing files if the parent needs it.
- * No other operations are performed on backing files. */
-perm &= BLK_PERM_CONSISTENT_READ;
-
-/* If the parent can deal with changing data, we're okay with a
- * writable and resizable backing file. */
-/* TODO Require !(perm & BLK_PERM_CONSISTENT_READ), too? */
-if (shared & BLK_PERM_WRITE) {
-shared = BLK_PERM_WRITE | BLK_PERM_RESIZE;
-} else {
-shared = 0;
-}
-
-shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_GRAPH_MOD |
-  BLK_PERM_WRITE_UNCHANGED;
-
-if (bs->open_flags & BDRV_O_INACTIVE) {
-shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
-}
-
-*nperm = perm;
-*nshared = shared;
+bdrv_default_perms_for_backing(bs, c, child_class, role, reopen_queue,
+   perm, shared, nperm, nshared);
 }
 }
 
-- 
2.23.0

[PATCH for-5.0 05/31] block: Drop BdrvChildClass.stay_at_node

2019-11-27 Thread Max Reitz

This should better be in BdrvChildRole so that parents can decide when
to and when not to keep the child node fixed.

Signed-off-by: Max Reitz 
---
 block.c   | 2 +-
 blockjob.c| 3 +--
 include/block/block_int.h | 4 
 3 files changed, 2 insertions(+), 7 deletions(-)

diff --git a/block.c b/block.c
index 90974ae36b..6c50ad661e 100644
--- a/block.c
+++ b/block.c
@@ -4103,7 +4103,7 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 GHashTable *found;
 bool ret;
 
-if (c->klass->stay_at_node) {
+if (c->role & BDRV_CHILD_STAY_AT_NODE) {
 return false;
 }
 
diff --git a/blockjob.c b/blockjob.c
index e7dbb4093a..f58356fb6c 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -170,7 +170,6 @@ static const BdrvChildClass child_job = {
 .drained_end= child_job_drained_end,
 .can_set_aio_ctx= child_job_can_set_aio_ctx,
 .set_aio_ctx= child_job_set_aio_ctx,
-.stay_at_node   = true,
 };
 
 void block_job_remove_all_bdrv(BlockJob *job)
@@ -217,7 +216,7 @@ int block_job_add_bdrv(BlockJob *job, const char *name, 
BlockDriverState *bs,
 if (job->job.aio_context != qemu_get_aio_context()) {
 aio_context_release(job->job.aio_context);
 }
-c = bdrv_root_attach_child(bs, name, &child_job, 0,
+c = bdrv_root_attach_child(bs, name, &child_job, BDRV_CHILD_STAY_AT_NODE,
job->job.aio_context, perm, shared_perm, job,
errp);
 if (job->job.aio_context != qemu_get_aio_context()) {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 85cfa4b069..102ce7853e 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -660,10 +660,6 @@ typedef struct BdrvAioNotifier {
 } BdrvAioNotifier;
 
 struct BdrvChildClass {
-/* If true, bdrv_replace_node() doesn't change the node this BdrvChild
- * points to. */
-bool stay_at_node;
-
 /* If true, the parent is a BlockDriverState and bdrv_next_all_states()
  * will return it. This information is used for drain_all, where every node
  * will be drained separately, so the drain only needs to be propagated to
-- 
2.23.0

[PATCH for-5.0 15/31] block: Split bdrv_default_perms_for_storage()

2019-11-27 Thread Max Reitz

We can be less restrictive about pure data children than those with
metadata on them.

For bdrv_format_default_perms(), we keep the safe option of
bdrv_default_perms_for_metadata() (until we drop
bdrv_format_default_perms() altogether).

That means that bdrv_default_perms_for_data() is unused at this point.
We will use it for all children that have the DATA role, but not the
METADATA role.  So far, no child has any role, so we do not use it, but
that will change.

Signed-off-by: Max Reitz 
---
 block.c | 53 +++--
 1 file changed, 43 insertions(+), 10 deletions(-)

diff --git a/block.c b/block.c
index 4d4ccbacdf..33abd7f64e 100644
--- a/block.c
+++ b/block.c
@@ -2294,18 +2294,17 @@ static void 
bdrv_default_perms_for_backing(BlockDriverState *bs, BdrvChild *c,
 *nshared = shared;
 }
 
-static void bdrv_default_perms_for_storage(BlockDriverState *bs, BdrvChild *c,
-   const BdrvChildClass *child_class,
-   BdrvChildRole role,
-   BlockReopenQueue *reopen_queue,
-   uint64_t perm, uint64_t shared,
-   uint64_t *nperm, uint64_t *nshared)
+static void bdrv_default_perms_for_metadata(BlockDriverState *bs, BdrvChild *c,
+const BdrvChildClass *child_class,
+BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared)
 {
 int flags;
 
 assert(child_class == &child_file ||
-   (child_class == &child_of_bds &&
-(role & (BDRV_CHILD_METADATA | BDRV_CHILD_DATA;
+   (child_class == &child_of_bds && (role & BDRV_CHILD_METADATA)));
 
 flags = bdrv_reopen_get_flags(reopen_queue, bs);
 
@@ -2338,6 +2337,40 @@ static void 
bdrv_default_perms_for_storage(BlockDriverState *bs, BdrvChild *c,
 *nshared = shared;
 }
 
+/* TODO: Use */
+static void __attribute__((unused))
+bdrv_default_perms_for_data(BlockDriverState *bs, BdrvChild *c,
+const BdrvChildClass *child_class,
+BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared)
+{
+assert(child_class == &child_of_bds && (role & BDRV_CHILD_DATA));
+
+/*
+ * Apart from the modifications below, the same permissions are
+ * forwarded and left alone as for filters
+ */
+bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
+  perm, shared, &perm, &shared);
+
+/*
+ * We cannot allow other users to resize the file because the
+ * format driver might have some assumptions about the size
+ * (e.g. because it is stored in metadata, or because the file is
+ * split into fixed-size data files).
+ */
+shared &= ~BLK_PERM_RESIZE;
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
+
+*nperm = perm;
+*nshared = shared;
+}
+
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildClass *child_class,
BdrvChildRole role,
@@ -2349,8 +2382,8 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 assert(child_class == &child_backing || child_class == &child_file);
 
 if (!backing) {
-bdrv_default_perms_for_storage(bs, c, child_class, role, reopen_queue,
-   perm, shared, nperm, nshared);
+bdrv_default_perms_for_metadata(bs, c, child_class, role, reopen_queue,
+perm, shared, nperm, nshared);
 } else {
 bdrv_default_perms_for_backing(bs, c, child_class, role, reopen_queue,
perm, shared, nperm, nshared);
-- 
2.23.0

[PATCH for-5.0 07/31] block: Pass BdrvChildRole to .inherit_options()

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 40 +++
 block/block-backend.c |  3 ++-
 block/vvfat.c |  3 ++-
 include/block/block_int.h |  3 ++-
 4 files changed, 30 insertions(+), 19 deletions(-)

diff --git a/block.c b/block.c
index 58252007af..f42682478e 100644
--- a/block.c
+++ b/block.c
@@ -77,6 +77,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
QDict *options, int flags,
BlockDriverState *parent,
const BdrvChildClass *child_class,
+   BdrvChildRole child_role,
Error **errp);
 
 /* If non-zero, use only whitelisted block drivers */
@@ -979,7 +980,8 @@ static void bdrv_temp_snapshot_options(int *child_flags, 
QDict *child_options,
  * Returns the options and flags that bs->file should get if a protocol driver
  * is expected, based on the given options and flags for the parent BDS
  */
-static void bdrv_inherited_options(int *child_flags, QDict *child_options,
+static void bdrv_inherited_options(BdrvChildRole role,
+   int *child_flags, QDict *child_options,
int parent_flags, QDict *parent_options)
 {
 int flags = parent_flags;
@@ -1028,10 +1030,11 @@ const BdrvChildClass child_file = {
  * (and not only protocols) is permitted for it, based on the given options and
  * flags for the parent BDS
  */
-static void bdrv_inherited_fmt_options(int *child_flags, QDict *child_options,
+static void bdrv_inherited_fmt_options(BdrvChildRole role,
+   int *child_flags, QDict *child_options,
int parent_flags, QDict *parent_options)
 {
-child_file.inherit_options(child_flags, child_options,
+child_file.inherit_options(role, child_flags, child_options,
parent_flags, parent_options);
 
 *child_flags &= ~BDRV_O_PROTOCOL;
@@ -1112,7 +1115,8 @@ static void bdrv_backing_detach(BdrvChild *c)
  * Returns the options and flags that bs->backing should get, based on the
  * given options and flags for the parent BDS
  */
-static void bdrv_backing_options(int *child_flags, QDict *child_options,
+static void bdrv_backing_options(BdrvChildRole role,
+ int *child_flags, QDict *child_options,
  int parent_flags, QDict *parent_options)
 {
 int flags = parent_flags;
@@ -2687,7 +2691,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 }
 
 backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
-   &child_backing, errp);
+   &child_backing, 0, errp);
 if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
@@ -2722,7 +2726,7 @@ free_exit:
 static BlockDriverState *
 bdrv_open_child_bs(const char *filename, QDict *options, const char *bdref_key,
BlockDriverState *parent, const BdrvChildClass *child_class,
-   bool allow_none, Error **errp)
+   BdrvChildRole child_role, bool allow_none, Error **errp)
 {
 BlockDriverState *bs = NULL;
 QDict *image_options;
@@ -2753,7 +2757,7 @@ bdrv_open_child_bs(const char *filename, QDict *options, 
const char *bdref_key,
 }
 
 bs = bdrv_open_inherit(filename, reference, image_options, 0,
-   parent, child_class, errp);
+   parent, child_class, child_role, errp);
 if (!bs) {
 goto done;
 }
@@ -2787,7 +2791,7 @@ BdrvChild *bdrv_open_child(const char *filename,
 BlockDriverState *bs;
 
 bs = bdrv_open_child_bs(filename, options, bdref_key, parent, child_class,
-allow_none, errp);
+child_role, allow_none, errp);
 if (bs == NULL) {
 return NULL;
 }
@@ -2836,7 +2840,7 @@ BlockDriverState *bdrv_open_blockdev_ref(BlockdevRef 
*ref, Error **errp)
 
 }
 
-bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, errp);
+bs = bdrv_open_inherit(NULL, reference, qdict, 0, NULL, NULL, 0, errp);
 obj = NULL;
 
 fail:
@@ -2935,6 +2939,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
QDict *options, int flags,
BlockDriverState *parent,
const BdrvChildClass *child_class,
+   BdrvChildRole child_role,
Error **errp)
 {
 int ret;
@@ -2987,7 +2992,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*file

[PATCH for-5.0 16/31] block: Add bdrv_default_perms()

2019-11-27 Thread Max Reitz

This callback can be used by BDSs that use child_of_bds with the
appropriate BdrvChildRole for their children.

Also, make bdrv_format_default_perms() use it for child_of_bds children
(just a temporary solution until we can drop bdrv_format_default_perms()
altogether).

Signed-off-by: Max Reitz 
---
 block.c   | 46 ---
 include/block/block_int.h | 11 ++
 2 files changed, 49 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 33abd7f64e..5b38c7799a 100644
--- a/block.c
+++ b/block.c
@@ -2337,14 +2337,12 @@ static void 
bdrv_default_perms_for_metadata(BlockDriverState *bs, BdrvChild *c,
 *nshared = shared;
 }
 
-/* TODO: Use */
-static void __attribute__((unused))
-bdrv_default_perms_for_data(BlockDriverState *bs, BdrvChild *c,
-const BdrvChildClass *child_class,
-BdrvChildRole role,
-BlockReopenQueue *reopen_queue,
-uint64_t perm, uint64_t shared,
-uint64_t *nperm, uint64_t *nshared)
+static void bdrv_default_perms_for_data(BlockDriverState *bs, BdrvChild *c,
+const BdrvChildClass *child_class,
+BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared)
 {
 assert(child_class == &child_of_bds && (role & BDRV_CHILD_DATA));
 
@@ -2379,6 +2377,13 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t *nperm, uint64_t *nshared)
 {
 bool backing = (child_class == &child_backing);
+
+if (child_class == &child_of_bds) {
+bdrv_default_perms(bs, c, child_class, role, reopen_queue,
+   perm, shared, nperm, nshared);
+return;
+}
+
 assert(child_class == &child_backing || child_class == &child_file);
 
 if (!backing) {
@@ -2390,6 +2395,31 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 }
 }
 
+void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
+const BdrvChildClass *child_class, BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared)
+{
+assert(child_class == &child_of_bds);
+
+if (role & BDRV_CHILD_FILTERED) {
+bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
+  perm, shared, nperm, nshared);
+} else if (role & BDRV_CHILD_COW) {
+bdrv_default_perms_for_backing(bs, c, child_class, role, reopen_queue,
+   perm, shared, nperm, nshared);
+} else if (role & BDRV_CHILD_METADATA) {
+bdrv_default_perms_for_metadata(bs, c, child_class, role, reopen_queue,
+perm, shared, nperm, nshared);
+} else if (role & BDRV_CHILD_DATA) {
+bdrv_default_perms_for_data(bs, c, child_class, role, reopen_queue,
+perm, shared, nperm, nshared);
+} else {
+g_assert_not_reached();
+}
+}
+
 uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
 {
 static const uint64_t permissions[] = {
diff --git a/include/block/block_int.h b/include/block/block_int.h
index f2f8d770c6..b9375ceb1c 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1272,6 +1272,17 @@ bool bdrv_recurse_can_replace(BlockDriverState *bs,
 bool bdrv_is_child_of(BlockDriverState *child, BlockDriverState *parent,
   int min_level);
 
+/*
+ * Default implementation for BlockDriver.bdrv_child_perm() that can
+ * be used by block filters and image formats, as long as they use the
+ * child_of_bds child class and set an appropriate BdrvChildRole.
+ */
+void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
+const BdrvChildClass *child_class, BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared);
+
 /*
  * Default implementation for drivers to pass bdrv_co_block_status() to
  * their file.
-- 
2.23.0

[PATCH for-5.0 06/31] block: Keep BDRV_O_NO_IO in *inherited_fmt_options

2019-11-27 Thread Max Reitz

bdrv_inherited_fmt_options() is used only by (pseudo-)filter drivers,
namely quorum and blkverify.  Both should pass on BDRV_O_NO_IO to their
children.  There is no reason to clear it in
bdrv_inherited_fmt_options().

Signed-off-by: Max Reitz 
---
 block.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 6c50ad661e..58252007af 100644
--- a/block.c
+++ b/block.c
@@ -1034,7 +1034,7 @@ static void bdrv_inherited_fmt_options(int *child_flags, 
QDict *child_options,
 child_file.inherit_options(child_flags, child_options,
parent_flags, parent_options);
 
-*child_flags &= ~(BDRV_O_PROTOCOL | BDRV_O_NO_IO);
+*child_flags &= ~BDRV_O_PROTOCOL;
 }
 
 const BdrvChildClass child_format = {
-- 
2.23.0

[PATCH for-5.0 20/31] block: Make backing files child_of_bds children

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c | 26 --
 block/backup-top.c  |  2 +-
 block/vvfat.c   |  3 ++-
 tests/test-bdrv-drain.c | 13 +++--
 4 files changed, 30 insertions(+), 14 deletions(-)

diff --git a/block.c b/block.c
index 4b8c33dccc..63fe19fd73 100644
--- a/block.c
+++ b/block.c
@@ -2725,6 +2725,20 @@ static bool 
bdrv_inherits_from_recursive(BlockDriverState *child,
 return child != NULL;
 }
 
+/*
+ * Return the BdrvChildRole for @bs's backing child.  bs->backing is
+ * mostly used for COW backing children (role = COW), but also for
+ * filtered children (role = FILTERED | PRIMARY).
+ */
+static BdrvChildRole bdrv_backing_role(BlockDriverState *bs)
+{
+if (bs->drv && bs->drv->is_filter) {
+return BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY;
+} else {
+return BDRV_CHILD_COW;
+}
+}
+
 /*
  * Sets the backing file link of a BDS. A new reference is created; callers
  * which don't need their own reference any more must call bdrv_unref().
@@ -2752,8 +2766,8 @@ void bdrv_set_backing_hd(BlockDriverState *bs, 
BlockDriverState *backing_hd,
 goto out;
 }
 
-bs->backing = bdrv_attach_child(bs, backing_hd, "backing", &child_backing,
-0, errp);
+bs->backing = bdrv_attach_child(bs, backing_hd, "backing", &child_of_bds,
+bdrv_backing_role(bs), errp);
 /* If backing_hd was already part of bs's backing chain, and
  * inherits_from pointed recursively to bs then let's update it to
  * point directly to bs (else it will become NULL). */
@@ -2850,7 +2864,7 @@ int bdrv_open_backing_file(BlockDriverState *bs, QDict 
*parent_options,
 }
 
 backing_hd = bdrv_open_inherit(backing_filename, reference, options, 0, bs,
-   &child_backing, 0, errp);
+   &child_of_bds, BDRV_CHILD_COW, errp);
 if (!backing_hd) {
 bs->open_flags |= BDRV_O_NO_BACKING;
 error_prepend(errp, "Could not open backing file: ");
@@ -3670,8 +3684,8 @@ int bdrv_reopen_multiple(BlockReopenQueue *bs_queue, 
Error **errp)
 if (state->replace_backing_bs && state->new_backing_bs) {
 uint64_t nperm, nshared;
 bdrv_child_perm(state->bs, state->new_backing_bs,
-NULL, &child_backing, 0, bs_queue,
-state->perm, state->shared_perm,
+NULL, &child_of_bds, bdrv_backing_role(state->bs),
+bs_queue, state->perm, state->shared_perm,
 &nperm, &nshared);
 ret = bdrv_check_update_perm(state->new_backing_bs, NULL,
  nperm, nshared, NULL, NULL, errp);
@@ -6642,7 +6656,7 @@ void bdrv_refresh_filename(BlockDriverState *bs)
 drv->bdrv_gather_child_options(bs, opts, backing_overridden);
 } else {
 QLIST_FOREACH(child, &bs->children, next) {
-if (child->klass == &child_backing && !backing_overridden) {
+if (child == bs->backing && !backing_overridden) {
 /* We can skip the backing BDS if it has not been overridden */
 continue;
 }
diff --git a/block/backup-top.c b/block/backup-top.c
index 811bc67fc7..ce97c0146a 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -138,7 +138,7 @@ static void backup_top_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 return;
 }
 
-if (child_class == &child_file) {
+if (!(role & BDRV_CHILD_FILTERED)) {
 /*
  * Target child
  *
diff --git a/block/vvfat.c b/block/vvfat.c
index 0c2f77bece..b8096763d5 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3228,7 +3228,8 @@ static void vvfat_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 {
 BDRVVVFATState *s = bs->opaque;
 
-assert(c == s->qcow || child_class == &child_backing);
+assert(c == s->qcow ||
+   (child_class == &child_of_bds && (role & BDRV_CHILD_COW)));
 
 if (c == s->qcow) {
 /* This is a private node, nobody should try to attach to it */
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index b3d7960bd0..15393a0140 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -96,7 +96,7 @@ static void bdrv_test_child_perm(BlockDriverState *bs, 
BdrvChild *c,
  * bdrv_format_default_perms() accepts only these two, so disguise
  * detach_by_driver_cb_parent as one of them.
  */
-if (child_class != &child_file && child_class != &child_backing) {
+if (child_class != &child_file && child_class != &child_of_bds) {
 child_class = &child_file;
 }
 
@@ -1399,8 +1399,8 @@ static void test_detach_indirect(bool by_parent_cb)
 bdrv_ref(a);
 child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_file, 0,
 &error_abort);
-child_a = bdrv_attach_c

[PATCH for-5.0 17/31] raw-format: Split raw_read_options()

2019-11-27 Thread Max Reitz

Split raw_read_options() into one function that actually just reads the
options, and another that applies them.  This will allow us to detect
whether the user has specified any options before attaching the file
child (so we can decide on its role based on the options).

Signed-off-by: Max Reitz 
---
 block/raw-format.c | 110 ++---
 1 file changed, 65 insertions(+), 45 deletions(-)

diff --git a/block/raw-format.c b/block/raw-format.c
index 849981afe4..4d47e59b7a 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -71,20 +71,13 @@ static QemuOptsList raw_create_opts = {
 }
 };
 
-static int raw_read_options(QDict *options, BlockDriverState *bs,
-BDRVRawState *s, Error **errp)
+static int raw_read_options(QDict *options, uint64_t *offset, bool *has_size,
+uint64_t *size, Error **errp)
 {
 Error *local_err = NULL;
 QemuOpts *opts = NULL;
-int64_t real_size = 0;
 int ret;
 
-real_size = bdrv_getlength(bs->file->bs);
-if (real_size < 0) {
-error_setg_errno(errp, -real_size, "Could not get image size");
-return real_size;
-}
-
 opts = qemu_opts_create(&raw_runtime_opts, NULL, 0, &error_abort);
 qemu_opts_absorb_qdict(opts, options, &local_err);
 if (local_err) {
@@ -93,64 +86,84 @@ static int raw_read_options(QDict *options, 
BlockDriverState *bs,
 goto end;
 }
 
-s->offset = qemu_opt_get_size(opts, "offset", 0);
-if (s->offset > real_size) {
-error_setg(errp, "Offset (%" PRIu64 ") cannot be greater than "
-"size of the containing file (%" PRId64 ")",
-s->offset, real_size);
-ret = -EINVAL;
-goto end;
-}
+*offset = qemu_opt_get_size(opts, "offset", 0);
+*has_size = qemu_opt_find(opts, "size");
+*size = qemu_opt_get_size(opts, "size", 0);
 
-if (qemu_opt_find(opts, "size") != NULL) {
-s->size = qemu_opt_get_size(opts, "size", 0);
-s->has_size = true;
-} else {
-s->has_size = false;
-s->size = real_size - s->offset;
+ret = 0;
+end:
+qemu_opts_del(opts);
+return ret;
+}
+
+static int raw_apply_options(BlockDriverState *bs, BDRVRawState *s,
+ uint64_t offset, bool has_size, uint64_t size,
+ Error **errp)
+{
+int64_t real_size = 0;
+
+real_size = bdrv_getlength(bs->file->bs);
+if (real_size < 0) {
+error_setg_errno(errp, -real_size, "Could not get image size");
+return real_size;
 }
 
 /* Check size and offset */
-if ((real_size - s->offset) < s->size) {
+if (offset > real_size) {
+error_setg(errp, "Offset (%" PRIu64 ") cannot be greater than "
+   "size of the containing file (%" PRId64 ")",
+   s->offset, real_size);
+return -EINVAL;
+}
+
+if (has_size && (real_size - offset) < size) {
 error_setg(errp, "The sum of offset (%" PRIu64 ") and size "
-"(%" PRIu64 ") has to be smaller or equal to the "
-" actual size of the containing file (%" PRId64 ")",
-s->offset, s->size, real_size);
-ret = -EINVAL;
-goto end;
+   "(%" PRIu64 ") has to be smaller or equal to the "
+   " actual size of the containing file (%" PRId64 ")",
+   s->offset, s->size, real_size);
+return -EINVAL;
 }
 
 /* Make sure size is multiple of BDRV_SECTOR_SIZE to prevent rounding
  * up and leaking out of the specified area. */
-if (s->has_size && !QEMU_IS_ALIGNED(s->size, BDRV_SECTOR_SIZE)) {
+if (has_size && !QEMU_IS_ALIGNED(size, BDRV_SECTOR_SIZE)) {
 error_setg(errp, "Specified size is not multiple of %llu",
-BDRV_SECTOR_SIZE);
-ret = -EINVAL;
-goto end;
+   BDRV_SECTOR_SIZE);
+return -EINVAL;
 }
 
-ret = 0;
-
-end:
+s->offset = offset;
+s->has_size = has_size;
+s->size = has_size ? size : real_size - offset;
 
-qemu_opts_del(opts);
-
-return ret;
+return 0;
 }
 
 static int raw_reopen_prepare(BDRVReopenState *reopen_state,
   BlockReopenQueue *queue, Error **errp)
 {
+bool has_size;
+uint64_t offset, size;
+int ret;
+
 assert(reopen_state != NULL);
 assert(reopen_state->bs != NULL);
 
 reopen_state->opaque = g_new0(BDRVRawState, 1);
 
-return raw_read_options(
-reopen_state->options,
-reopen_state->bs,
-reopen_state->opaque,
-errp);
+ret = raw_read_options(reopen_state->options, &offset, &has_size, &size,
+   errp);
+if (ret < 0) {
+return ret;
+}
+
+ret = raw_apply_options(reopen_state->bs, reopen_state->opaque,
+offset, has_size, size, errp);
+if (ret < 0) {
+return ret;
+}
+
+return 0;
 }
 
 static void raw_reopen

[PATCH for-5.0 09/31] block: Unify bdrv_child_cb_attach()

2019-11-27 Thread Max Reitz

Make bdrv_child_cb_attach() call bdrv_backing_attach() for children with
a COW role (and drop the reverse call from bdrv_backing_attach()), so it
can be used for any child (with a proper role set).

Because so far no child has a proper role set, we need a temporary new
callback for child_backing.attach that ensures bdrv_backing_attach() is
called for all COW children that do not have their role set yet.

Signed-off-by: Max Reitz 
---
 block.c | 16 +++-
 1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index b3decde6c5..24a8910047 100644
--- a/block.c
+++ b/block.c
@@ -920,9 +920,16 @@ static void bdrv_child_cb_drained_end(BdrvChild *child,
 bdrv_drained_end_no_poll(bs, drained_end_counter);
 }
 
+static void bdrv_backing_attach(BdrvChild *c);
+
 static void bdrv_child_cb_attach(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
+
+if (child->role & BDRV_CHILD_COW) {
+bdrv_backing_attach(child);
+}
+
 bdrv_apply_subtree_drain(child, bs);
 }
 
@@ -1133,7 +1140,14 @@ static void bdrv_backing_attach(BdrvChild *c)
 parent->backing_blocker);
 bdrv_op_unblock(backing_hd, BLOCK_OP_TYPE_BACKUP_TARGET,
 parent->backing_blocker);
+}
 
+/* XXX: Will be removed along with child_backing */
+static void bdrv_child_cb_attach_backing(BdrvChild *c)
+{
+if (!(c->role & BDRV_CHILD_COW)) {
+bdrv_backing_attach(c);
+}
 bdrv_child_cb_attach(c);
 }
 
@@ -1192,7 +1206,7 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 const BdrvChildClass child_backing = {
 .parent_is_bds   = true,
 .get_parent_desc = bdrv_child_get_parent_desc,
-.attach  = bdrv_backing_attach,
+.attach  = bdrv_child_cb_attach_backing,
 .detach  = bdrv_backing_detach,
 .inherit_options = bdrv_backing_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
-- 
2.23.0

[PATCH for-5.0 08/31] block: Unify bdrv_*inherited_options()

2019-11-27 Thread Max Reitz

We can keep the logic for child_file, child_format, and child_backing in
a single function and differentiate based on the BdrvChildRole.

Signed-off-by: Max Reitz 
---
 block.c | 96 +++--
 1 file changed, 60 insertions(+), 36 deletions(-)

diff --git a/block.c b/block.c
index f42682478e..b3decde6c5 100644
--- a/block.c
+++ b/block.c
@@ -977,8 +977,8 @@ static void bdrv_temp_snapshot_options(int *child_flags, 
QDict *child_options,
 }
 
 /*
- * Returns the options and flags that bs->file should get if a protocol driver
- * is expected, based on the given options and flags for the parent BDS
+ * Returns the options and flags that a generic child of a BDS should
+ * get, based on the given options and flags for the parent BDS.
  */
 static void bdrv_inherited_options(BdrvChildRole role,
int *child_flags, QDict *child_options,
@@ -986,8 +986,16 @@ static void bdrv_inherited_options(BdrvChildRole role,
 {
 int flags = parent_flags;
 
-/* Enable protocol handling, disable format probing for bs->file */
-flags |= BDRV_O_PROTOCOL;
+assert((role & (BDRV_CHILD_PROTOCOL | BDRV_CHILD_FORMAT))
+!= (BDRV_CHILD_PROTOCOL | BDRV_CHILD_FORMAT));
+
+if (role & BDRV_CHILD_PROTOCOL) {
+/* Enable protocol handling, disable format probing */
+flags |= BDRV_O_PROTOCOL;
+} else if (role & BDRV_CHILD_FORMAT) {
+/* Enable format handling */
+flags &= ~BDRV_O_PROTOCOL;
+}
 
 /* If the cache mode isn't explicitly set, inherit direct and no-flush from
  * the parent. */
@@ -995,26 +1003,57 @@ static void bdrv_inherited_options(BdrvChildRole role,
 qdict_copy_default(child_options, parent_options, BDRV_OPT_CACHE_NO_FLUSH);
 qdict_copy_default(child_options, parent_options, BDRV_OPT_FORCE_SHARE);
 
-/* Inherit the read-only option from the parent if it's not set */
-qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
-qdict_copy_default(child_options, parent_options, BDRV_OPT_AUTO_READ_ONLY);
+if (role & BDRV_CHILD_COW) {
+/* backing files always opened read-only */
+qdict_set_default_str(child_options, BDRV_OPT_READ_ONLY, "on");
+qdict_set_default_str(child_options, BDRV_OPT_AUTO_READ_ONLY, "off");
+} else {
+/* Inherit the read-only option from the parent if it's not set */
+qdict_copy_default(child_options, parent_options, BDRV_OPT_READ_ONLY);
+qdict_copy_default(child_options, parent_options,
+   BDRV_OPT_AUTO_READ_ONLY);
+}
 
-/* Our block drivers take care to send flushes and respect unmap policy,
- * so we can default to enable both on lower layers regardless of the
- * corresponding parent options. */
-qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
+if (role & BDRV_CHILD_PROTOCOL) {
+/*
+ * Our format drivers (which expect protocol children underneath, hence
+ * the condition) take care to send flushes and respect unmap policy, 
so
+ * we can default to enable both on lower layers regardless of the
+ * corresponding parent options.
+ */
+qdict_set_default_str(child_options, BDRV_OPT_DISCARD, "unmap");
+}
 
 /* Clear flags that only apply to the top layer */
-flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ |
-   BDRV_O_NO_IO);
+flags &= ~(BDRV_O_SNAPSHOT | BDRV_O_NO_BACKING | BDRV_O_COPY_ON_READ);
+
+if (role & BDRV_CHILD_METADATA) {
+flags &= ~BDRV_O_NO_IO;
+}
+if (role & BDRV_CHILD_COW) {
+flags &= ~BDRV_O_TEMPORARY;
+}
 
 *child_flags = flags;
 }
 
+/*
+ * Returns the options and flags that bs->file should get if a protocol driver
+ * is expected, based on the given options and flags for the parent BDS
+ */
+static void bdrv_inherited_file_options(BdrvChildRole role,
+int *child_flags, QDict *child_options,
+int parent_flags, QDict 
*parent_options)
+{
+bdrv_inherited_options(BDRV_CHILD_IMAGE,
+   child_flags, child_options,
+   parent_flags, parent_options);
+}
+
 const BdrvChildClass child_file = {
 .parent_is_bds   = true,
 .get_parent_desc = bdrv_child_get_parent_desc,
-.inherit_options = bdrv_inherited_options,
+.inherit_options = bdrv_inherited_file_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_poll= bdrv_child_cb_drained_poll,
 .drained_end = bdrv_child_cb_drained_end,
@@ -1034,10 +1073,9 @@ static void bdrv_inherited_fmt_options(BdrvChildRole 
role,
int *child_flags, QDict *child_options,
int parent_flags, QDict *parent_options)
 {
-child_file.inherit_options(role, child_flags, child_op

[PATCH for-5.0 21/31] block: Drop child_backing

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 62 +++
 include/block/block_int.h |  1 -
 2 files changed, 4 insertions(+), 59 deletions(-)

diff --git a/block.c b/block.c
index 63fe19fd73..0530c81c6d 100644
--- a/block.c
+++ b/block.c
@@ -1147,15 +1147,6 @@ static void bdrv_backing_attach(BdrvChild *c)
 parent->backing_blocker);
 }
 
-/* XXX: Will be removed along with child_backing */
-static void bdrv_child_cb_attach_backing(BdrvChild *c)
-{
-if (!(c->role & BDRV_CHILD_COW)) {
-bdrv_backing_attach(c);
-}
-bdrv_child_cb_attach(c);
-}
-
 static void bdrv_backing_detach(BdrvChild *c)
 {
 BlockDriverState *parent = c->opaque;
@@ -1166,28 +1157,6 @@ static void bdrv_backing_detach(BdrvChild *c)
 parent->backing_blocker = NULL;
 }
 
-/* XXX: Will be removed along with child_backing */
-static void bdrv_child_cb_detach_backing(BdrvChild *c)
-{
-if (!(c->role & BDRV_CHILD_COW)) {
-bdrv_backing_detach(c);
-}
-bdrv_child_cb_detach(c);
-}
-
-/*
- * Returns the options and flags that bs->backing should get, based on the
- * given options and flags for the parent BDS
- */
-static void bdrv_backing_options(BdrvChildRole role,
- int *child_flags, QDict *child_options,
- int parent_flags, QDict *parent_options)
-{
-bdrv_inherited_options(BDRV_CHILD_COW,
-   child_flags, child_options,
-   parent_flags, parent_options);
-}
-
 static int bdrv_backing_update_filename(BdrvChild *c, BlockDriverState *base,
 const char *filename, Error **errp)
 {
@@ -1215,21 +1184,6 @@ static int bdrv_backing_update_filename(BdrvChild *c, 
BlockDriverState *base,
 return ret;
 }
 
-const BdrvChildClass child_backing = {
-.parent_is_bds   = true,
-.get_parent_desc = bdrv_child_get_parent_desc,
-.attach  = bdrv_child_cb_attach_backing,
-.detach  = bdrv_child_cb_detach_backing,
-.inherit_options = bdrv_backing_options,
-.drained_begin   = bdrv_child_cb_drained_begin,
-.drained_poll= bdrv_child_cb_drained_poll,
-.drained_end = bdrv_child_cb_drained_end,
-.inactivate  = bdrv_child_cb_inactivate,
-.update_filename = bdrv_backing_update_filename,
-.can_set_aio_ctx = bdrv_child_cb_can_set_aio_ctx,
-.set_aio_ctx = bdrv_child_cb_set_aio_ctx,
-};
-
 static int bdrv_open_flags(BlockDriverState *bs, int flags)
 {
 int open_flags = flags;
@@ -2235,8 +2189,7 @@ static void 
bdrv_default_perms_for_backing(BlockDriverState *bs, BdrvChild *c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
-assert(child_class == &child_backing ||
-   (child_class == &child_of_bds && (role & BDRV_CHILD_COW)));
+assert(child_class == &child_of_bds && (role & BDRV_CHILD_COW));
 
 /*
  * We want consistent read from backing files if the parent needs it.
@@ -2348,23 +2301,16 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
-bool backing = (child_class == &child_backing);
-
 if (child_class == &child_of_bds) {
 bdrv_default_perms(bs, c, child_class, role, reopen_queue,
perm, shared, nperm, nshared);
 return;
 }
 
-assert(child_class == &child_backing || child_class == &child_file);
+assert(child_class == &child_file);
 
-if (!backing) {
-bdrv_default_perms_for_metadata(bs, c, child_class, role, reopen_queue,
-perm, shared, nperm, nshared);
-} else {
-bdrv_default_perms_for_backing(bs, c, child_class, role, reopen_queue,
-   perm, shared, nperm, nshared);
-}
+bdrv_default_perms_for_metadata(bs, c, child_class, role, reopen_queue,
+perm, shared, nperm, nshared);
 }
 
 void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index fe6206b210..895bcf4d30 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -731,7 +731,6 @@ struct BdrvChildClass {
 
 extern const BdrvChildClass child_of_bds;
 extern const BdrvChildClass child_file;
-extern const BdrvChildClass child_backing;
 
 struct BdrvChild {
 BlockDriverState *bs;
-- 
2.23.0

[PATCH for-5.0 23/31] block: Make filter drivers use child_of_bds

2019-11-27 Thread Max Reitz

Note that some filters have secondary children, namely blkverify (the
image to be verified) and blklogwrites (the log).  This patch does not
touch those children.

Signed-off-by: Max Reitz 
---
 block/blkdebug.c | 4 +++-
 block/blklogwrites.c | 3 ++-
 block/blkverify.c| 5 -
 block/copy-on-read.c | 5 +++--
 block/replication.c  | 3 ++-
 block/throttle.c | 5 +++--
 6 files changed, 17 insertions(+), 8 deletions(-)

diff --git a/block/blkdebug.c b/block/blkdebug.c
index 8dd8ed6055..b31fa40b0e 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -497,7 +497,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 /* Open the image file */
 bs->file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options, "image",
-   bs, &child_file, 0, false, &local_err);
+   bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+   false, &local_err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 4faf912ef1..78b0c49460 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -157,7 +157,8 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Open the file */
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0, 
false,
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false,
&local_err);
 if (local_err) {
 ret = -EINVAL;
diff --git a/block/blkverify.c b/block/blkverify.c
index 4f4d079b12..7df4cb8007 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -125,7 +125,10 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 /* Open the raw file */
 bs->file = bdrv_open_child(qemu_opt_get(opts, "x-raw"), options, "raw",
-   bs, &child_file, 0, false, &local_err);
+   bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PROTOCOL |
+   BDRV_CHILD_PRIMARY,
+   false, &local_err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
diff --git a/block/copy-on-read.c b/block/copy-on-read.c
index a2d92ac394..c857ea0da7 100644
--- a/block/copy-on-read.c
+++ b/block/copy-on-read.c
@@ -28,8 +28,9 @@
 static int cor_open(BlockDriverState *bs, QDict *options, int flags,
 Error **errp)
 {
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0, 
false,
-   errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+   false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
diff --git a/block/replication.c b/block/replication.c
index 9ca5c9368e..ec512ae1c3 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -90,7 +90,8 @@ static int replication_open(BlockDriverState *bs, QDict 
*options,
 const char *mode;
 const char *top_id;
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
false, errp);
 if (!bs->file) {
 return -EINVAL;
diff --git a/block/throttle.c b/block/throttle.c
index 2dea913be7..47b0a3522d 100644
--- a/block/throttle.c
+++ b/block/throttle.c
@@ -81,8 +81,9 @@ static int throttle_open(BlockDriverState *bs, QDict *options,
 char *group;
 int ret;
 
-bs->file = bdrv_open_child(NULL, options, "file", bs,
-   &child_file, 0, false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
+   false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
-- 
2.23.0

[PATCH for-5.0 19/31] block: Drop child_format

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 28 
 include/block/block_int.h |  1 -
 2 files changed, 29 deletions(-)

diff --git a/block.c b/block.c
index 5b38c7799a..4b8c33dccc 100644
--- a/block.c
+++ b/block.c
@@ -1104,34 +1104,6 @@ const BdrvChildClass child_file = {
 .set_aio_ctx = bdrv_child_cb_set_aio_ctx,
 };
 
-/*
- * Returns the options and flags that bs->file should get if the use of formats
- * (and not only protocols) is permitted for it, based on the given options and
- * flags for the parent BDS
- */
-static void bdrv_inherited_fmt_options(BdrvChildRole role,
-   int *child_flags, QDict *child_options,
-   int parent_flags, QDict *parent_options)
-{
-bdrv_inherited_options(BDRV_CHILD_DATA | BDRV_CHILD_FORMAT,
-   child_flags, child_options,
-   parent_flags, parent_options);
-}
-
-const BdrvChildClass child_format = {
-.parent_is_bds   = true,
-.get_parent_desc = bdrv_child_get_parent_desc,
-.inherit_options = bdrv_inherited_fmt_options,
-.drained_begin   = bdrv_child_cb_drained_begin,
-.drained_poll= bdrv_child_cb_drained_poll,
-.drained_end = bdrv_child_cb_drained_end,
-.attach  = bdrv_child_cb_attach,
-.detach  = bdrv_child_cb_detach,
-.inactivate  = bdrv_child_cb_inactivate,
-.can_set_aio_ctx = bdrv_child_cb_can_set_aio_ctx,
-.set_aio_ctx = bdrv_child_cb_set_aio_ctx,
-};
-
 static void bdrv_backing_attach(BdrvChild *c)
 {
 BlockDriverState *parent = c->opaque;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index b9375ceb1c..fe6206b210 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -731,7 +731,6 @@ struct BdrvChildClass {
 
 extern const BdrvChildClass child_of_bds;
 extern const BdrvChildClass child_file;
-extern const BdrvChildClass child_format;
 extern const BdrvChildClass child_backing;
 
 struct BdrvChild {
-- 
2.23.0

[PATCH for-5.0 12/31] block: Distinguish paths in *_format_default_perms

2019-11-27 Thread Max Reitz

bdrv_format_default_perms() has one code path for backing files, and one
for storage files.  We want to pull them out into own functions, so
make sure they are completely distinct before so the next patches will
be a bit cleaner.

Signed-off-by: Max Reitz 
---
 block.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/block.c b/block.c
index 8542768d35..eb282f0977 100644
--- a/block.c
+++ b/block.c
@@ -2285,6 +2285,13 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 perm |= BLK_PERM_CONSISTENT_READ;
 }
 shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
+
+*nperm = perm;
+*nshared = shared;
 } else {
 /* We want consistent read from backing files if the parent needs it.
  * No other operations are performed on backing files. */
@@ -2301,14 +2308,14 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 
 shared |= BLK_PERM_CONSISTENT_READ | BLK_PERM_GRAPH_MOD |
   BLK_PERM_WRITE_UNCHANGED;
-}
 
-if (bs->open_flags & BDRV_O_INACTIVE) {
-shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
-}
+if (bs->open_flags & BDRV_O_INACTIVE) {
+shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
 
-*nperm = perm;
-*nshared = shared;
+*nperm = perm;
+*nshared = shared;
+}
 }
 
 uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm)
-- 
2.23.0

[PATCH for-5.0 22/31] block: Make format drivers use child_of_bds

2019-11-27 Thread Max Reitz

Commonly, they need to pass the BDRV_CHILD_IMAGE set as the
BdrvChildRole; but there are exceptions for drivers with external data
files (qcow2 and vmdk).

Signed-off-by: Max Reitz 
---
 block/bochs.c |  4 ++--
 block/cloop.c |  4 ++--
 block/crypto.c|  4 ++--
 block/dmg.c   |  4 ++--
 block/parallels.c |  4 ++--
 block/qcow.c  |  4 ++--
 block/qcow2.c | 20 +++-
 block/qed.c   |  4 ++--
 block/vdi.c   |  4 ++--
 block/vhdx.c  |  4 ++--
 block/vmdk.c  | 20 +---
 block/vpc.c   |  4 ++--
 12 files changed, 52 insertions(+), 28 deletions(-)

diff --git a/block/bochs.c b/block/bochs.c
index cd399a4ad3..15f9807954 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -110,8 +110,8 @@ static int bochs_open(BlockDriverState *bs, QDict *options, 
int flags,
 return ret;
 }
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_IMAGE, false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
diff --git a/block/cloop.c b/block/cloop.c
index 42a8b0f107..6662af7470 100644
--- a/block/cloop.c
+++ b/block/cloop.c
@@ -71,8 +71,8 @@ static int cloop_open(BlockDriverState *bs, QDict *options, 
int flags,
 return ret;
 }
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_IMAGE, false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
diff --git a/block/crypto.c b/block/crypto.c
index 737042010a..b5e31aee6f 100644
--- a/block/crypto.c
+++ b/block/crypto.c
@@ -200,8 +200,8 @@ static int block_crypto_open_generic(QCryptoBlockFormat 
format,
 unsigned int cflags = 0;
 QDict *cryptoopts = NULL;
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_IMAGE, false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
diff --git a/block/dmg.c b/block/dmg.c
index 9fcd59af8d..479d764d82 100644
--- a/block/dmg.c
+++ b/block/dmg.c
@@ -439,8 +439,8 @@ static int dmg_open(BlockDriverState *bs, QDict *options, 
int flags,
 return ret;
 }
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_IMAGE, false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
diff --git a/block/parallels.c b/block/parallels.c
index 769e4d0e29..8ff7bfcc40 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -728,8 +728,8 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
 Error *local_err = NULL;
 char *buf;
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_IMAGE, false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
diff --git a/block/qcow.c b/block/qcow.c
index 3138894eab..c81a687195 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -130,8 +130,8 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
 qdict_extract_subqdict(options, &encryptopts, "encrypt.");
 encryptfmt = qdict_get_try_str(encryptopts, "format");
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   BDRV_CHILD_IMAGE, false, errp);
 if (!bs->file) {
 ret = -EINVAL;
 goto fail;
diff --git a/block/qcow2.c b/block/qcow2.c
index 89a4e5a4e4..22d70ce62f 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -1535,8 +1535,10 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 }
 
 /* Open external data file */
-s->data_file = bdrv_open_child(NULL, options, "data-file", bs, &child_file,
-   0, true, &local_err);
+s->data_file = bdrv_open_child(NULL, options, "data-file", bs,
+   &child_of_bds,
+   BDRV_CHILD_DATA | BDRV_CHILD_PROTOCOL,
+   true, &local_err);
 if (local_err) {
 error_propagate(errp, local_err);
 ret = -EINVAL;
@@ -1546,7 +1548,9 @@ static int coroutine_fn qcow2_do_open(BlockDriverState 
*bs, QDict *options,
 if (s->incompatible_features & QCOW2_INCOMPAT_DATA_F

[PATCH for-5.0 24/31] block: Use child_of_bds in remaining places

2019-11-27 Thread Max Reitz

Replace child_file by child_of_bds in all remaining places (excluding
tests).

Signed-off-by: Max Reitz 
---
 block.c  |  3 ++-
 block/backup-top.c   |  4 ++--
 block/blklogwrites.c |  3 ++-
 block/blkreplay.c|  5 +++--
 block/raw-format.c   | 16 ++--
 5 files changed, 23 insertions(+), 8 deletions(-)

diff --git a/block.c b/block.c
index 0530c81c6d..e800ce862e 100644
--- a/block.c
+++ b/block.c
@@ -3182,7 +3182,8 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 BlockDriverState *file_bs;
 
 file_bs = bdrv_open_child_bs(filename, options, "file", bs,
- &child_file, 0, true, &local_err);
+ &child_of_bds, BDRV_CHILD_IMAGE,
+ true, &local_err);
 if (local_err) {
 goto fail;
 }
diff --git a/block/backup-top.c b/block/backup-top.c
index ce97c0146a..eccd6bfae0 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -200,8 +200,8 @@ BlockDriverState *bdrv_backup_top_append(BlockDriverState 
*source,
 top->opaque = state = g_new0(BDRVBackupTopState, 1);
 
 bdrv_ref(target);
-state->target = bdrv_attach_child(top, target, "target", &child_file, 0,
-  errp);
+state->target = bdrv_attach_child(top, target, "target", &child_of_bds,
+  BDRV_CHILD_DATA, errp);
 if (!state->target) {
 bdrv_unref(target);
 bdrv_unref(top);
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 78b0c49460..3ee991b38e 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -167,7 +167,8 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Open the log file */
-s->log_file = bdrv_open_child(NULL, options, "log", bs, &child_file, 0,
+s->log_file = bdrv_open_child(NULL, options, "log", bs, &child_of_bds,
+  BDRV_CHILD_METADATA | BDRV_CHILD_PROTOCOL,
   false, &local_err);
 if (local_err) {
 ret = -EINVAL;
diff --git a/block/blkreplay.c b/block/blkreplay.c
index f97493f45a..71628f4d56 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -27,8 +27,9 @@ static int blkreplay_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret;
 
 /* Open the image file */
-bs->file = bdrv_open_child(NULL, options, "image",
-   bs, &child_file, 0, false, &local_err);
+bs->file = bdrv_open_child(NULL, options, "image", bs, &child_of_bds,
+   BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+   false, &local_err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
diff --git a/block/raw-format.c b/block/raw-format.c
index 4d47e59b7a..6c46dee996 100644
--- a/block/raw-format.c
+++ b/block/raw-format.c
@@ -444,6 +444,7 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 BDRVRawState *s = bs->opaque;
 bool has_size;
 uint64_t offset, size;
+BdrvChildRole file_role;
 int ret;
 
 ret = raw_read_options(options, &offset, &has_size, &size, errp);
@@ -451,8 +452,19 @@ static int raw_open(BlockDriverState *bs, QDict *options, 
int flags,
 return ret;
 }
 
-bs->file = bdrv_open_child(NULL, options, "file", bs, &child_file, 0,
-   false, errp);
+/*
+ * Without offset and a size limit, this driver behaves very much
+ * like a filter.  With any such limit, it does not.
+ */
+if (offset || has_size) {
+file_role = BDRV_CHILD_DATA | BDRV_CHILD_PROTOCOL | BDRV_CHILD_PRIMARY;
+} else {
+file_role = BDRV_CHILD_FILTERED | BDRV_CHILD_PROTOCOL |
+BDRV_CHILD_PRIMARY;
+}
+
+bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+   file_role, false, errp);
 if (!bs->file) {
 return -EINVAL;
 }
-- 
2.23.0

[PATCH for-5.0 28/31] block: Drop bdrv_format_default_perms()

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 19 ---
 include/block/block_int.h | 11 ---
 2 files changed, 30 deletions(-)

diff --git a/block.c b/block.c
index b6c92ef283..3fcd56aaae 100644
--- a/block.c
+++ b/block.c
@@ -2294,25 +2294,6 @@ static void bdrv_default_perms_for_data(BlockDriverState 
*bs, BdrvChild *c,
 *nshared = shared;
 }
 
-void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
-   const BdrvChildClass *child_class,
-   BdrvChildRole role,
-   BlockReopenQueue *reopen_queue,
-   uint64_t perm, uint64_t shared,
-   uint64_t *nperm, uint64_t *nshared)
-{
-if (child_class == &child_of_bds) {
-bdrv_default_perms(bs, c, child_class, role, reopen_queue,
-   perm, shared, nperm, nshared);
-return;
-}
-
-assert(child_class == &child_file);
-
-bdrv_default_perms_for_metadata(bs, c, child_class, role, reopen_queue,
-perm, shared, nperm, nshared);
-}
-
 void bdrv_default_perms(BlockDriverState *bs, BdrvChild *c,
 const BdrvChildClass *child_class, BdrvChildRole role,
 BlockReopenQueue *reopen_queue,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 7818734708..05e7a27318 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1243,17 +1243,6 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
  */
 int bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp);
 
-/* Default implementation for BlockDriver.bdrv_child_perm() that can be used by
- * (non-raw) image formats: Like above for bs->backing, but for bs->file it
- * requires WRITE | RESIZE for read-write images, always requires
- * CONSISTENT_READ and doesn't share WRITE. */
-void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
-   const BdrvChildClass *child_class,
-   BdrvChildRole child_role,
-   BlockReopenQueue *reopen_queue,
-   uint64_t perm, uint64_t shared,
-   uint64_t *nperm, uint64_t *nshared);
-
 bool bdrv_recurse_can_replace(BlockDriverState *bs,
   BlockDriverState *to_replace);
 
-- 
2.23.0

[PATCH for-5.0 10/31] block: Unify bdrv_child_cb_detach()

2019-11-27 Thread Max Reitz

Make bdrv_child_cb_detach() call bdrv_backing_detach() for children with
a COW role (and drop the reverse call from bdrv_backing_detach()), so it
can be used for any child (with a proper role set).

Because so far no child has a proper role set, we need a temporary new
callback for child_backing.detach that ensures bdrv_backing_detach() is
called for all COW children that do not have their role set yet.

Signed-off-by: Max Reitz 
---
 block.c | 15 ++-
 1 file changed, 14 insertions(+), 1 deletion(-)

diff --git a/block.c b/block.c
index 24a8910047..89214efa36 100644
--- a/block.c
+++ b/block.c
@@ -921,6 +921,7 @@ static void bdrv_child_cb_drained_end(BdrvChild *child,
 }
 
 static void bdrv_backing_attach(BdrvChild *c);
+static void bdrv_backing_detach(BdrvChild *c);
 
 static void bdrv_child_cb_attach(BdrvChild *child)
 {
@@ -936,6 +937,11 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 static void bdrv_child_cb_detach(BdrvChild *child)
 {
 BlockDriverState *bs = child->opaque;
+
+if (child->role & BDRV_CHILD_COW) {
+bdrv_backing_detach(child);
+}
+
 bdrv_unapply_subtree_drain(child, bs);
 }
 
@@ -1159,7 +1165,14 @@ static void bdrv_backing_detach(BdrvChild *c)
 bdrv_op_unblock_all(c->bs, parent->backing_blocker);
 error_free(parent->backing_blocker);
 parent->backing_blocker = NULL;
+}
 
+/* XXX: Will be removed along with child_backing */
+static void bdrv_child_cb_detach_backing(BdrvChild *c)
+{
+if (!(c->role & BDRV_CHILD_COW)) {
+bdrv_backing_detach(c);
+}
 bdrv_child_cb_detach(c);
 }
 
@@ -1207,7 +1220,7 @@ const BdrvChildClass child_backing = {
 .parent_is_bds   = true,
 .get_parent_desc = bdrv_child_get_parent_desc,
 .attach  = bdrv_child_cb_attach_backing,
-.detach  = bdrv_backing_detach,
+.detach  = bdrv_child_cb_detach_backing,
 .inherit_options = bdrv_backing_options,
 .drained_begin   = bdrv_child_cb_drained_begin,
 .drained_poll= bdrv_child_cb_drained_poll,
-- 
2.23.0

[PATCH for-5.0 30/31] block: Pass BdrvChildRole in remaining cases

2019-11-27 Thread Max Reitz

These calls have no real use for the child role yet, but it will not
harm to give one.

Notably, the bdrv_root_attach_child() call in blockjob.c is left
unmodified because there is not much the generic BlockJob object wants
from its children.

Signed-off-by: Max Reitz 
---
 block/block-backend.c | 11 +++
 block/vvfat.c |  2 +-
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/block/block-backend.c b/block/block-backend.c
index 98f3167fa6..988633178a 100644
--- a/block/block-backend.c
+++ b/block/block-backend.c
@@ -401,8 +401,9 @@ BlockBackend *blk_new_open(const char *filename, const char 
*reference,
 return NULL;
 }
 
-blk->root = bdrv_root_attach_child(bs, "root", &child_root, 0, blk->ctx,
-   perm, BLK_PERM_ALL, blk, errp);
+blk->root = bdrv_root_attach_child(bs, "root", &child_root,
+   BDRV_CHILD_FILTERED | 
BDRV_CHILD_PRIMARY,
+   blk->ctx, perm, BLK_PERM_ALL, blk, 
errp);
 if (!blk->root) {
 blk_unref(blk);
 return NULL;
@@ -812,8 +813,10 @@ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, 
Error **errp)
 {
 ThrottleGroupMember *tgm = &blk->public.throttle_group_member;
 bdrv_ref(bs);
-blk->root = bdrv_root_attach_child(bs, "root", &child_root, 0, blk->ctx,
-   blk->perm, blk->shared_perm, blk, errp);
+blk->root = bdrv_root_attach_child(bs, "root", &child_root,
+   BDRV_CHILD_FILTERED | 
BDRV_CHILD_PRIMARY,
+   blk->ctx, blk->perm, blk->shared_perm,
+   blk, errp);
 if (blk->root == NULL) {
 return -EPERM;
 }
diff --git a/block/vvfat.c b/block/vvfat.c
index b8096763d5..8fa8ddff98 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -3193,7 +3193,7 @@ static int enable_write_target(BlockDriverState *bs, 
Error **errp)
 options = qdict_new();
 qdict_put_str(options, "write-target.driver", "qcow");
 s->qcow = bdrv_open_child(s->qcow_filename, options, "write-target", bs,
-  &child_vvfat_qcow, 0, false, errp);
+  &child_vvfat_qcow, BDRV_CHILD_DATA, false, errp);
 qobject_unref(options);
 if (!s->qcow) {
 ret = -EINVAL;
-- 
2.23.0

[PATCH for-5.0 25/31] tests: Use child_of_bds instead of child_file

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 tests/test-bdrv-drain.c | 29 +
 tests/test-bdrv-graph-mod.c |  6 --
 2 files changed, 21 insertions(+), 14 deletions(-)

diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 15393a0140..91567ca97d 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -97,7 +97,7 @@ static void bdrv_test_child_perm(BlockDriverState *bs, 
BdrvChild *c,
  * detach_by_driver_cb_parent as one of them.
  */
 if (child_class != &child_file && child_class != &child_of_bds) {
-child_class = &child_file;
+child_class = &child_of_bds;
 }
 
 bdrv_format_default_perms(bs, c, child_class, role, reopen_queue,
@@ -1203,7 +1203,8 @@ static void do_test_delete_by_drain(bool 
detach_instead_of_delete,
 
 null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | 
BDRV_O_PROTOCOL,
 &error_abort);
-bdrv_attach_child(bs, null_bs, "null-child", &child_file, 0, &error_abort);
+bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds,
+  BDRV_CHILD_DATA, &error_abort);
 
 /* This child will be the one to pass to requests through to, and
  * it will stall until a drain occurs */
@@ -1211,14 +1212,17 @@ static void do_test_delete_by_drain(bool 
detach_instead_of_delete,
 &error_abort);
 child_bs->total_sectors = 65536 >> BDRV_SECTOR_BITS;
 /* Takes our reference to child_bs */
-tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child", 
&child_file,
-0, &error_abort);
+tts->wait_child = bdrv_attach_child(bs, child_bs, "wait-child",
+&child_of_bds,
+BDRV_CHILD_DATA | BDRV_CHILD_PRIMARY,
+&error_abort);
 
 /* This child is just there to be deleted
  * (for detach_instead_of_delete == true) */
 null_bs = bdrv_open("null-co://", NULL, NULL, BDRV_O_RDWR | 
BDRV_O_PROTOCOL,
 &error_abort);
-bdrv_attach_child(bs, null_bs, "null-child", &child_file, 0, &error_abort);
+bdrv_attach_child(bs, null_bs, "null-child", &child_of_bds, 
BDRV_CHILD_DATA,
+  &error_abort);
 
 blk = blk_new(qemu_get_aio_context(), BLK_PERM_ALL, BLK_PERM_ALL);
 blk_insert_bs(blk, bs, &error_abort);
@@ -1315,7 +1319,8 @@ static void detach_indirect_bh(void *opaque)
 
 bdrv_ref(data->c);
 data->child_c = bdrv_attach_child(data->parent_b, data->c, "PB-C",
-  &child_file, 0, &error_abort);
+  &child_of_bds, BDRV_CHILD_DATA,
+  &error_abort);
 }
 
 static void detach_by_parent_aio_cb(void *opaque, int ret)
@@ -1332,7 +1337,7 @@ static void detach_by_driver_cb_drained_begin(BdrvChild 
*child)
 {
 aio_bh_schedule_oneshot(qemu_get_current_aio_context(),
 detach_indirect_bh, &detach_by_parent_data);
-child_file.drained_begin(child);
+child_of_bds.drained_begin(child);
 }
 
 static BdrvChildClass detach_by_driver_cb_class;
@@ -1367,7 +1372,7 @@ static void test_detach_indirect(bool by_parent_cb)
 QEMUIOVector qiov = QEMU_IOVEC_INIT_BUF(qiov, NULL, 0);
 
 if (!by_parent_cb) {
-detach_by_driver_cb_class = child_file;
+detach_by_driver_cb_class = child_of_bds;
 detach_by_driver_cb_class.drained_begin =
 detach_by_driver_cb_drained_begin;
 }
@@ -1397,15 +1402,15 @@ static void test_detach_indirect(bool by_parent_cb)
 /* Set child relationships */
 bdrv_ref(b);
 bdrv_ref(a);
-child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_file, 0,
-&error_abort);
+child_b = bdrv_attach_child(parent_b, b, "PB-B", &child_of_bds,
+BDRV_CHILD_DATA, &error_abort);
 child_a = bdrv_attach_child(parent_b, a, "PB-A", &child_of_bds,
 BDRV_CHILD_COW, &error_abort);
 
 bdrv_ref(a);
 bdrv_attach_child(parent_a, a, "PA-A",
-  by_parent_cb ? &child_file : &detach_by_driver_cb_class,
-  0, &error_abort);
+  by_parent_cb ? &child_of_bds : 
&detach_by_driver_cb_class,
+  BDRV_CHILD_DATA, &error_abort);
 
 g_assert_cmpint(parent_a->refcnt, ==, 1);
 g_assert_cmpint(parent_b->refcnt, ==, 1);
diff --git a/tests/test-bdrv-graph-mod.c b/tests/test-bdrv-graph-mod.c
index 3707e2533c..6ae91ff171 100644
--- a/tests/test-bdrv-graph-mod.c
+++ b/tests/test-bdrv-graph-mod.c
@@ -112,7 +112,8 @@ static void test_update_perm_tree(void)
 
 blk_insert_bs(root, bs, &error_abort);
 
-bdrv_attach_child(filter, bs, "child", &child_file, 0, &error_abort);
+bdrv_attach_child(filter, bs, "child", &child_of_bds,
+  BDRV_CHILD_FI

[PATCH for-5.0 27/31] block: Make bdrv_filter_default_perms() static

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 12 ++--
 include/block/block_int.h | 10 --
 2 files changed, 6 insertions(+), 16 deletions(-)

diff --git a/block.c b/block.c
index e800ce862e..b6c92ef283 100644
--- a/block.c
+++ b/block.c
@@ -2171,12 +2171,12 @@ int bdrv_child_refresh_perms(BlockDriverState *bs, 
BdrvChild *c, Error **errp)
 return bdrv_child_try_set_perm(c, perms, shared, errp);
 }
 
-void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
-   const BdrvChildClass *child_class,
-   BdrvChildRole role,
-   BlockReopenQueue *reopen_queue,
-   uint64_t perm, uint64_t shared,
-   uint64_t *nperm, uint64_t *nshared)
+static void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
+  const BdrvChildClass *child_class,
+  BdrvChildRole role,
+  BlockReopenQueue *reopen_queue,
+  uint64_t perm, uint64_t shared,
+  uint64_t *nperm, uint64_t *nshared)
 {
 *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
 *nshared = (shared & DEFAULT_PERM_PASSTHROUGH) | DEFAULT_PERM_UNCHANGED;
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 895bcf4d30..7818734708 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1243,16 +1243,6 @@ int bdrv_child_try_set_perm(BdrvChild *c, uint64_t perm, 
uint64_t shared,
  */
 int bdrv_child_refresh_perms(BlockDriverState *bs, BdrvChild *c, Error **errp);
 
-/* Default implementation for BlockDriver.bdrv_child_perm() that can be used by
- * block filters: Forward CONSISTENT_READ, WRITE, WRITE_UNCHANGED and RESIZE to
- * all children */
-void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
-   const BdrvChildClass *child_class,
-   BdrvChildRole child_role,
-   BlockReopenQueue *reopen_queue,
-   uint64_t perm, uint64_t shared,
-   uint64_t *nperm, uint64_t *nshared);
-
 /* Default implementation for BlockDriver.bdrv_child_perm() that can be used by
  * (non-raw) image formats: Like above for bs->backing, but for bs->file it
  * requires WRITE | RESIZE for read-write images, always requires
-- 
2.23.0

[PATCH for-5.0 14/31] block: Pull out bdrv_default_perms_for_storage()

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c | 71 +
 1 file changed, 46 insertions(+), 25 deletions(-)

diff --git a/block.c b/block.c
index 2771bc45ce..4d4ccbacdf 100644
--- a/block.c
+++ b/block.c
@@ -2294,6 +2294,50 @@ static void 
bdrv_default_perms_for_backing(BlockDriverState *bs, BdrvChild *c,
 *nshared = shared;
 }
 
+static void bdrv_default_perms_for_storage(BlockDriverState *bs, BdrvChild *c,
+   const BdrvChildClass *child_class,
+   BdrvChildRole role,
+   BlockReopenQueue *reopen_queue,
+   uint64_t perm, uint64_t shared,
+   uint64_t *nperm, uint64_t *nshared)
+{
+int flags;
+
+assert(child_class == &child_file ||
+   (child_class == &child_of_bds &&
+(role & (BDRV_CHILD_METADATA | BDRV_CHILD_DATA;
+
+flags = bdrv_reopen_get_flags(reopen_queue, bs);
+
+/*
+ * Apart from the modifications below, the same permissions are
+ * forwarded and left alone as for filters
+ */
+bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
+  perm, shared, &perm, &shared);
+
+/* Format drivers may touch metadata even if the guest doesn't write */
+if (bdrv_is_writable_after_reopen(bs, reopen_queue)) {
+perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
+
+/*
+ * bs->file always needs to be consistent because of the metadata. We
+ * can never allow other users to resize or write to it.
+ */
+if (!(flags & BDRV_O_NO_IO)) {
+perm |= BLK_PERM_CONSISTENT_READ;
+}
+shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
+
+if (bs->open_flags & BDRV_O_INACTIVE) {
+shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
+}
+
+*nperm = perm;
+*nshared = shared;
+}
+
 void bdrv_format_default_perms(BlockDriverState *bs, BdrvChild *c,
const BdrvChildClass *child_class,
BdrvChildRole role,
@@ -2305,31 +2349,8 @@ void bdrv_format_default_perms(BlockDriverState *bs, 
BdrvChild *c,
 assert(child_class == &child_backing || child_class == &child_file);
 
 if (!backing) {
-int flags = bdrv_reopen_get_flags(reopen_queue, bs);
-
-/* Apart from the modifications below, the same permissions are
- * forwarded and left alone as for filters */
-bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
-  perm, shared, &perm, &shared);
-
-/* Format drivers may touch metadata even if the guest doesn't write */
-if (bdrv_is_writable_after_reopen(bs, reopen_queue)) {
-perm |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
-}
-
-/* bs->file always needs to be consistent because of the metadata. We
- * can never allow other users to resize or write to it. */
-if (!(flags & BDRV_O_NO_IO)) {
-perm |= BLK_PERM_CONSISTENT_READ;
-}
-shared &= ~(BLK_PERM_WRITE | BLK_PERM_RESIZE);
-
-if (bs->open_flags & BDRV_O_INACTIVE) {
-shared |= BLK_PERM_WRITE | BLK_PERM_RESIZE;
-}
-
-*nperm = perm;
-*nshared = shared;
+bdrv_default_perms_for_storage(bs, c, child_class, role, reopen_queue,
+   perm, shared, nperm, nshared);
 } else {
 bdrv_default_perms_for_backing(bs, c, child_class, role, reopen_queue,
perm, shared, nperm, nshared);
-- 
2.23.0

[PATCH for-5.0 26/31] block: Use bdrv_default_perms()

2019-11-27 Thread Max Reitz

bdrv_default_perms() can decide which permission profile to use based on
the BdrvChildRole, so block drivers do not need to select it explicitly.

The blkverify driver now no longer shares the WRITE permission for the
image to verify.  We thus have to adjust two places in
test-block-iothread not to take it.  (Note that in theory, blkverify
should behave like quorum in this regard and share neither WRITE nor
RESIZE for both of its children.  In practice, it does not really
matter, because blkverify is used only for debugging, so we might as
well keep its permissions rather liberal.)

Signed-off-by: Max Reitz 
---
 block/backup-top.c  |  4 ++--
 block/blkdebug.c|  4 ++--
 block/blklogwrites.c|  9 ++---
 block/blkreplay.c   |  2 +-
 block/blkverify.c   |  2 +-
 block/bochs.c   |  2 +-
 block/cloop.c   |  2 +-
 block/crypto.c  |  2 +-
 block/dmg.c |  2 +-
 block/parallels.c   |  2 +-
 block/qcow.c|  2 +-
 block/qcow2.c   |  2 +-
 block/qed.c |  2 +-
 block/raw-format.c  |  2 +-
 block/throttle.c|  2 +-
 block/vdi.c |  2 +-
 block/vhdx.c|  2 +-
 block/vmdk.c|  2 +-
 block/vpc.c |  2 +-
 tests/test-bdrv-drain.c | 10 +-
 tests/test-bdrv-graph-mod.c |  2 +-
 tests/test-block-iothread.c | 17 ++---
 22 files changed, 42 insertions(+), 36 deletions(-)

diff --git a/block/backup-top.c b/block/backup-top.c
index eccd6bfae0..54e499744f 100644
--- a/block/backup-top.c
+++ b/block/backup-top.c
@@ -149,8 +149,8 @@ static void backup_top_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 *nperm = BLK_PERM_WRITE;
 } else {
 /* Source child */
-bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
-  perm, shared, nperm, nshared);
+bdrv_default_perms(bs, c, child_class, role, reopen_queue,
+   perm, shared, nperm, nshared);
 
 if (perm & BLK_PERM_WRITE) {
 *nperm = *nperm | BLK_PERM_CONSISTENT_READ;
diff --git a/block/blkdebug.c b/block/blkdebug.c
index b31fa40b0e..a925d8295e 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -1003,8 +1003,8 @@ static void blkdebug_child_perm(BlockDriverState *bs, 
BdrvChild *c,
 {
 BDRVBlkdebugState *s = bs->opaque;
 
-bdrv_filter_default_perms(bs, c, child_class, role, reopen_queue,
-  perm, shared, nperm, nshared);
+bdrv_default_perms(bs, c, child_class, role, reopen_queue,
+   perm, shared, nperm, nshared);
 
 *nperm |= s->take_child_perms;
 *nshared &= ~s->unshare_child_perms;
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 3ee991b38e..48091c2788 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -296,13 +296,8 @@ static void blk_log_writes_child_perm(BlockDriverState 
*bs, BdrvChild *c,
 return;
 }
 
-if (!strcmp(c->name, "log")) {
-bdrv_format_default_perms(bs, c, child_class, role, ro_q, perm, shrd,
-  nperm, nshrd);
-} else {
-bdrv_filter_default_perms(bs, c, child_class, role, ro_q, perm, shrd,
-  nperm, nshrd);
-}
+bdrv_default_perms(bs, c, child_class, role, ro_q, perm, shrd,
+   nperm, nshrd);
 }
 
 static void blk_log_writes_refresh_limits(BlockDriverState *bs, Error **errp)
diff --git a/block/blkreplay.c b/block/blkreplay.c
index 71628f4d56..be8cdb6b60 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -138,7 +138,7 @@ static BlockDriver bdrv_blkreplay = {
 .instance_size  = 0,
 
 .bdrv_open  = blkreplay_open,
-.bdrv_child_perm= bdrv_filter_default_perms,
+.bdrv_child_perm= bdrv_default_perms,
 .bdrv_getlength = blkreplay_getlength,
 
 .bdrv_co_preadv = blkreplay_co_preadv,
diff --git a/block/blkverify.c b/block/blkverify.c
index 7df4cb8007..4192f79d89 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -321,7 +321,7 @@ static BlockDriver bdrv_blkverify = {
 .bdrv_parse_filename  = blkverify_parse_filename,
 .bdrv_file_open   = blkverify_open,
 .bdrv_close   = blkverify_close,
-.bdrv_child_perm  = bdrv_filter_default_perms,
+.bdrv_child_perm  = bdrv_default_perms,
 .bdrv_getlength   = blkverify_getlength,
 .bdrv_refresh_filename= blkverify_refresh_filename,
 .bdrv_dirname = blkverify_dirname,
diff --git a/block/bochs.c b/block/bochs.c
index 15f9807954..96779743b2 100644
--- a/block/bochs.c
+++ b/block/bochs.c
@@ -297,7 +297,7 @@ static BlockDriver bdrv_bochs = {
 .instance_size = sizeof(BDRVBochsState),
 .bdrv_probe

[PATCH for-5.0 31/31] block: Drop @child_class from bdrv_child_perm()

2019-11-27 Thread Max Reitz

Implementations should decide the necessary permissions based on @role.

Signed-off-by: Max Reitz 
---
 block.c | 45 -
 block/backup-top.c  |  3 +--
 block/blkdebug.c|  3 +--
 block/blklogwrites.c|  3 +--
 block/commit.c  |  1 -
 block/copy-on-read.c|  1 -
 block/mirror.c  |  1 -
 block/quorum.c  |  1 -
 block/replication.c |  1 -
 block/vvfat.c   |  4 +---
 include/block/block_int.h   |  4 +---
 tests/test-bdrv-drain.c | 19 +---
 tests/test-bdrv-graph-mod.c |  1 -
 13 files changed, 25 insertions(+), 62 deletions(-)

diff --git a/block.c b/block.c
index b2dc05b028..6780904fad 100644
--- a/block.c
+++ b/block.c
@@ -1744,13 +1744,13 @@ bool bdrv_is_writable(BlockDriverState *bs)
 }
 
 static void bdrv_child_perm(BlockDriverState *bs, BlockDriverState *child_bs,
-BdrvChild *c, const BdrvChildClass *child_class,
-BdrvChildRole role, BlockReopenQueue *reopen_queue,
+BdrvChild *c, BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
 uint64_t parent_perm, uint64_t parent_shared,
 uint64_t *nperm, uint64_t *nshared)
 {
 assert(bs->drv && bs->drv->bdrv_child_perm);
-bs->drv->bdrv_child_perm(bs, c, child_class, role, reopen_queue,
+bs->drv->bdrv_child_perm(bs, c, role, reopen_queue,
  parent_perm, parent_shared,
  nperm, nshared);
 /* TODO Take force_share from reopen_queue */
@@ -1844,7 +1844,7 @@ static int bdrv_check_perm(BlockDriverState *bs, 
BlockReopenQueue *q,
 uint64_t cur_perm, cur_shared;
 bool child_tighten_restr;
 
-bdrv_child_perm(bs, c->bs, c, c->klass, c->role, q,
+bdrv_child_perm(bs, c->bs, c, c->role, q,
 cumulative_perms, cumulative_shared_perms,
 &cur_perm, &cur_shared);
 ret = bdrv_child_check_perm(c, q, cur_perm, cur_shared, 
ignore_children,
@@ -1911,7 +1911,7 @@ static void bdrv_set_perm(BlockDriverState *bs, uint64_t 
cumulative_perms,
 /* Update all children */
 QLIST_FOREACH(c, &bs->children, next) {
 uint64_t cur_perm, cur_shared;
-bdrv_child_perm(bs, c->bs, c, c->klass, c->role, NULL,
+bdrv_child_perm(bs, c->bs, c, c->role, NULL,
 cumulative_perms, cumulative_shared_perms,
 &cur_perm, &cur_shared);
 bdrv_child_set_perm(c, cur_perm, cur_shared);
@@ -2138,14 +2138,13 @@ int bdrv_child_refresh_perms(BlockDriverState *bs, 
BdrvChild *c, Error **errp)
 uint64_t perms, shared;
 
 bdrv_get_cumulative_perm(bs, &parent_perms, &parent_shared);
-bdrv_child_perm(bs, c->bs, c, c->klass, c->role, NULL,
+bdrv_child_perm(bs, c->bs, c, c->role, NULL,
 parent_perms, parent_shared, &perms, &shared);
 
 return bdrv_child_try_set_perm(c, perms, shared, errp);
 }
 
 static void bdrv_filter_default_perms(BlockDriverState *bs, BdrvChild *c,
-  const BdrvChildClass *child_class,
   BdrvChildRole role,
   BlockReopenQueue *reopen_queue,
   uint64_t perm, uint64_t shared,
@@ -2156,13 +2155,12 @@ static void bdrv_filter_default_perms(BlockDriverState 
*bs, BdrvChild *c,
 }
 
 static void bdrv_default_perms_for_backing(BlockDriverState *bs, BdrvChild *c,
-   const BdrvChildClass *child_class,
BdrvChildRole role,
BlockReopenQueue *reopen_queue,
uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
 {
-assert(child_class == &child_of_bds && (role & BDRV_CHILD_COW));
+assert(role & BDRV_CHILD_COW);
 
 /*
  * We want consistent read from backing files if the parent needs it.
@@ -2193,7 +2191,6 @@ static void 
bdrv_default_perms_for_backing(BlockDriverState *bs, BdrvChild *c,
 }
 
 static void bdrv_default_perms_for_metadata(BlockDriverState *bs, BdrvChild *c,
-const BdrvChildClass *child_class,
 BdrvChildRole role,
 BlockReopenQueue *reopen_queue,
 uint64_t perm, uint64_t shared,
@@ -2201,7 +2198,7 @@ static void 
bdrv_default_perms_for_metadata(BlockDriverState *bs, BdrvChild *c,
 {
 int flags;
 
-assert(child_class == &child_of_bds && (role & BDRV_CHILD_METADATA));
+assert(role & BDRV_CHILD_METADATA);
 
 flags = bdrv_reopen_get_flags(reo

[PATCH for-5.0 18/31] block: Switch child_format users to child_of_bds

2019-11-27 Thread Max Reitz

Both users (quorum and blkverify) use child_format for
not-really-filtered children, so the appropriate BdrvChildRole in both
cases is DATA | FORMAT.

Signed-off-by: Max Reitz 
---
 block/blkverify.c | 5 +++--
 block/quorum.c| 9 ++---
 2 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/block/blkverify.c b/block/blkverify.c
index ba4f6d7b7c..4f4d079b12 100644
--- a/block/blkverify.c
+++ b/block/blkverify.c
@@ -134,8 +134,9 @@ static int blkverify_open(BlockDriverState *bs, QDict 
*options, int flags,
 
 /* Open the test file */
 s->test_file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options,
-   "test", bs, &child_format, 0, false,
-   &local_err);
+   "test", bs, &child_of_bds,
+   BDRV_CHILD_DATA | BDRV_CHILD_FORMAT,
+   false, &local_err);
 if (local_err) {
 ret = -EINVAL;
 error_propagate(errp, local_err);
diff --git a/block/quorum.c b/block/quorum.c
index a6b2d73668..29f7a14e7c 100644
--- a/block/quorum.c
+++ b/block/quorum.c
@@ -997,8 +997,10 @@ static int quorum_open(BlockDriverState *bs, QDict 
*options, int flags,
 assert(ret < 32);
 
 s->children[i].child = bdrv_open_child(NULL, options, indexstr, bs,
-   &child_format, 0, false,
-   &local_err);
+   &child_of_bds,
+   BDRV_CHILD_DATA |
+   BDRV_CHILD_FORMAT,
+   false, &local_err);
 if (local_err) {
 ret = -EINVAL;
 goto close_exit;
@@ -1074,7 +1076,8 @@ static void quorum_add_child(BlockDriverState *bs, 
BlockDriverState *child_bs,
 /* We can safely add the child now */
 bdrv_ref(child_bs);
 
-child = bdrv_attach_child(bs, child_bs, indexstr, &child_format, 0, errp);
+child = bdrv_attach_child(bs, child_bs, indexstr, &child_of_bds,
+  BDRV_CHILD_DATA | BDRV_CHILD_FORMAT, errp);
 if (child == NULL) {
 s->next_child_index--;
 goto out;
-- 
2.23.0

[PATCH for-5.0 29/31] block: Drop child_file

2019-11-27 Thread Max Reitz

Signed-off-by: Max Reitz 
---
 block.c   | 30 +-
 include/block/block_int.h |  1 -
 tests/test-bdrv-drain.c   |  8 +++-
 3 files changed, 4 insertions(+), 35 deletions(-)

diff --git a/block.c b/block.c
index 3fcd56aaae..b2dc05b028 100644
--- a/block.c
+++ b/block.c
@@ -1077,33 +1077,6 @@ const BdrvChildClass child_of_bds = {
 .update_filename = bdrv_child_cb_update_filename,
 };
 
-/*
- * Returns the options and flags that bs->file should get if a protocol driver
- * is expected, based on the given options and flags for the parent BDS
- */
-static void bdrv_inherited_file_options(BdrvChildRole role,
-int *child_flags, QDict *child_options,
-int parent_flags, QDict 
*parent_options)
-{
-bdrv_inherited_options(BDRV_CHILD_IMAGE,
-   child_flags, child_options,
-   parent_flags, parent_options);
-}
-
-const BdrvChildClass child_file = {
-.parent_is_bds   = true,
-.get_parent_desc = bdrv_child_get_parent_desc,
-.inherit_options = bdrv_inherited_file_options,
-.drained_begin   = bdrv_child_cb_drained_begin,
-.drained_poll= bdrv_child_cb_drained_poll,
-.drained_end = bdrv_child_cb_drained_end,
-.attach  = bdrv_child_cb_attach,
-.detach  = bdrv_child_cb_detach,
-.inactivate  = bdrv_child_cb_inactivate,
-.can_set_aio_ctx = bdrv_child_cb_can_set_aio_ctx,
-.set_aio_ctx = bdrv_child_cb_set_aio_ctx,
-};
-
 static void bdrv_backing_attach(BdrvChild *c)
 {
 BlockDriverState *parent = c->opaque;
@@ -2228,8 +2201,7 @@ static void 
bdrv_default_perms_for_metadata(BlockDriverState *bs, BdrvChild *c,
 {
 int flags;
 
-assert(child_class == &child_file ||
-   (child_class == &child_of_bds && (role & BDRV_CHILD_METADATA)));
+assert(child_class == &child_of_bds && (role & BDRV_CHILD_METADATA));
 
 flags = bdrv_reopen_get_flags(reopen_queue, bs);
 
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 05e7a27318..04eab10eda 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -730,7 +730,6 @@ struct BdrvChildClass {
 };
 
 extern const BdrvChildClass child_of_bds;
-extern const BdrvChildClass child_file;
 
 struct BdrvChild {
 BlockDriverState *bs;
diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c
index 0da5a3a6a1..655fd0d085 100644
--- a/tests/test-bdrv-drain.c
+++ b/tests/test-bdrv-drain.c
@@ -93,12 +93,10 @@ static void bdrv_test_child_perm(BlockDriverState *bs, 
BdrvChild *c,
  uint64_t *nperm, uint64_t *nshared)
 {
 /*
- * bdrv_default_perms() accepts only these two, so disguise
- * detach_by_driver_cb_parent as one of them.
+ * bdrv_default_perms() accepts nothing else, so disguise
+ * detach_by_driver_cb_parent.
  */
-if (child_class != &child_file && child_class != &child_of_bds) {
-child_class = &child_of_bds;
-}
+child_class = &child_of_bds;
 
 bdrv_default_perms(bs, c, child_class, role, reopen_queue,
perm, shared, nperm, nshared);
-- 
2.23.0

[PATCH for 4.2?] .travis.yml: drop xcode9.4 from build matrix

2019-11-27 Thread Alex Bennée

It's broken so it's no longer helping. The latest Xcode is covered by
Cirrus.

Signed-off-by: Alex Bennée 
---
 .travis.yml | 8 +---
 1 file changed, 1 insertion(+), 7 deletions(-)

diff --git a/.travis.yml b/.travis.yml
index c09b6a00143..445b0646c18 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -237,13 +237,7 @@ matrix:
 - TEST_CMD=""
 
 
-# MacOSX builds
-- env:
-- CONFIG="--target-list=${MAIN_SOFTMMU_TARGETS}"
-  os: osx
-  osx_image: xcode9.4
-  compiler: clang
-
+# MacOSX builds - cirrus.yml also tests some MacOS builds including latest 
Xcode
 
 - env:
 - 
CONFIG="--target-list=i386-softmmu,ppc-softmmu,ppc64-softmmu,m68k-softmmu,x86_64-softmmu"
-- 
2.20.1

Re: QMP netdev_add multiple dnssearch values

2019-11-27 Thread Markus Armbruster

Alex Kirillov  writes:

>> What exactly goes wrong? Does the QMP command fail? Does it succeed
>> but the network backend incorrectly?
>
> QMP command succesfully creates Slirp backend, but ignore whole arguments:
> - `dnssearch`
> - `hostfwd`
> - `guestfwd`

You're right, QMP command netdev_add silently ignores arguments that
aren't string, number, or bool, i.e. exactly the three you quoted.  Has
always been that way, as far as I can tell.

> As example, `dnssearch` field of `NetdevUserOptions` goes straight to the 
> function `slirp_dnssearch` (net/slirp.c), where it converts to `char **`. But 
> at this moment, this parameter is simply NULL, when I pass something 
> differrent from simple string.
>
> This is very strange, because type of this parameters is `StringList` and 
> must require something like [{"str": "a"}, {"str": "b"}].

During our push to get QMP feature-complete, we took some shortcuts.
One of them is qmp_netdev_add().

Objective back then : provide a QMP command for the existing netdev
configuration machinery net_client_init().  Due to its roots in CLI,
net_client_init() takes a QemuOpts.

Proper solution: define a QAPI schema, rewrite net_client_init() to take
the resulting QAPI type instead of QemuOpts, make existing users convert
from QemuOpts to the QAPI type, have qmp_netdev_add() take the QAPI type
as argument, and pass it to net_client_init().  Too much work.

Shortcut: use 'gen': false to bypass generated marshaling, marshal by
hand into a QemuOpts, so we can call unmodified net_client_init().  That
became commit 928059a37b "qapi: convert netdev_add".

The "marshal by hand into a QemuOpts" uses qemu_opts_from_qdict(), which
goes back to similarly shortcut QMP command device_add:

commit 01e7f18869c9ee4c84793f4a39ec1f5f4128a0aa
Author: Markus Armbruster 
Date:   Wed Feb 10 20:15:29 2010 +0100

qemu-option: Functions to convert to/from QDict

The functions are somewhat restricted.  Good enough for the job at
hand.  We'll extend them when we need more.

"Good enough" was true back then.  It wasn't true when we reused it for
netdev_add: hostfwd and guestfwd are list-valued.

We did define a QAPI schema a few months later (14aa0c2de0 "qapi schema:
add Netdev types").  net_client_init() uses it to convert from QemuOpts
to QAPI type Netdev.  This took us to the crazy pipeline we still use
today:

CLI, HMP
(key=value,...)
   |
   v
QMP (JSON) -> QDict -> QemuOpts -> Netdev

We should instead use:

  CLI, HMP
  (key=value,...)
 |
 v
  QemuOpts
 |
 v
QMP (JSON) -> QDict -> Netdev

Back in 2016, Eric (cc'ed) posted patches to get us to this pipeline.
They got stuck on backward compatibility worries: the old code accepts
all parameters as JSON strings in addition to their proper type, the new
code doesn't.  Undocumented misfeature, but we chickened out anyway.

Let's reconsider.  Eric's patches break interface misuse that may or may
not exist in the field.  They fix a correct use of interface people want
to use (or Alex wouldn't have reported this bug), and they make QMP
introspection work for netdev_add.

Eric, what do you think?

Re: [PATCH for 4.2?] .travis.yml: drop xcode9.4 from build matrix

2019-11-27 Thread Philippe Mathieu-Daudé

Le mer. 27 nov. 2019 14:51, Alex Bennée  a écrit :

> It's broken so it's no longer helping. The latest Xcode is covered by
> Cirrus.
>
> Signed-off-by: Alex Bennée 
>

Reviewed-by: Philippe Mathieu-Daudé 

---
>  .travis.yml | 8 +---
>  1 file changed, 1 insertion(+), 7 deletions(-)
>
> diff --git a/.travis.yml b/.travis.yml
> index c09b6a00143..445b0646c18 100644
> --- a/.travis.yml
> +++ b/.travis.yml
> @@ -237,13 +237,7 @@ matrix:
>  - TEST_CMD=""
>
>
> -# MacOSX builds
> -- env:
> -- CONFIG="--target-list=${MAIN_SOFTMMU_TARGETS}"
> -  os: osx
> -  osx_image: xcode9.4
> -  compiler: clang
> -
> +# MacOSX builds - cirrus.yml also tests some MacOS builds including
> latest Xcode
>
>  - env:
>  -
> CONFIG="--target-list=i386-softmmu,ppc-softmmu,ppc64-softmmu,m68k-softmmu,x86_64-softmmu"
> --
> 2.20.1
>
>
>

Re: [PATCH v3 5/6] iotests: Enable more tests in the 'auto' group to improve test coverage

2019-11-27 Thread Thomas Huth


On 24/10/2019 13.14, Alex Bennée wrote:


Thomas Huth  writes:




According to Max, it would be good to have a test for iothreads and
migration. 127 and 256 seem to be good candidates for iothreads. For
migration, let's enable 091, 181, and 203 (which also tests iothreads).



@@ -112,7 +112,7 @@
  088 rw quick
  089 rw auto quick
  090 rw auto quick
-091 rw migration
+091 rw auto migration



This is breaking consistently on my ZFS machine:

TESTiotest-qcow2: 091 [fail]


OK, I'll drop 091 again from the auto group in the next version of this 
series.


 Thomas

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Beata Michalska

Hi

On Wed, 27 Nov 2019 at 12:47, Xiang Zheng  wrote:
>
> Hi Beata,
>
> Thanks for you review!
>
YAW

> On 2019/11/22 23:47, Beata Michalska wrote:
> > Hi,
> >
> > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
> >>
> >> From: Dongjiu Geng 
> >>
> >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> >> translates the host VA delivered by host to guest PA, then fills this PA
> >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> >> type.
> >>
> >> When guest accesses the poisoned memory, it will generate a Synchronous
> >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> >> memory_failure() to unmapped the affected page in stage 2, finally
> >> returns to guest.
> >>
> >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> >> Qemu, Qemu records this error address into guest APEI GHES memory and
> >> notifes guest using Synchronous-External-Abort(SEA).
> >>
> >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> >> in which we can setup the type of exception and the syndrome information.
> >> When switching to guest, the target vcpu will jump to the synchronous
> >> external abort vector table entry.
> >>
> >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> >> not valid and hold an UNKNOWN value. These values will be set to KVM
> >> register structures through KVM_SET_ONE_REG IOCTL.
> >>
> >> Signed-off-by: Dongjiu Geng 
> >> Signed-off-by: Xiang Zheng 
> >> Reviewed-by: Michael S. Tsirkin 
> >> ---
> >>  hw/acpi/acpi_ghes.c | 297 
> >>  include/hw/acpi/acpi_ghes.h |   4 +
> >>  include/sysemu/kvm.h|   3 +-
> >>  target/arm/cpu.h|   4 +
> >>  target/arm/helper.c |   2 +-
> >>  target/arm/internals.h  |   5 +-
> >>  target/arm/kvm64.c  |  64 
> >>  target/arm/tlb_helper.c |   2 +-
> >>  target/i386/cpu.h   |   2 +
> >>  9 files changed, 377 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/hw/acpi/acpi_ghes.c b/hw/acpi/acpi_ghes.c
> >> index 42c00ff3d3..f5b54990c0 100644
> >> --- a/hw/acpi/acpi_ghes.c
> >> +++ b/hw/acpi/acpi_ghes.c
> >> @@ -39,6 +39,34 @@
> >>  /* The max size in bytes for one error block */
> >>  #define ACPI_GHES_MAX_RAW_DATA_LENGTH   0x1000
> >>
> >> +/*
> >> + * The total size of Generic Error Data Entry
> >> + * ACPI 6.1/6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-343 Generic Error Data Entry
> >> + */
> >> +#define ACPI_GHES_DATA_LENGTH   72
> >> +
> >> +/*
> >> + * The memory section CPER size,
> >> + * UEFI 2.6: N.2.5 Memory Error Section
> >> + */
> >> +#define ACPI_GHES_MEM_CPER_LENGTH   80
> >> +
> >> +/*
> >> + * Masks for block_status flags
> >> + */
> >> +#define ACPI_GEBS_UNCORRECTABLE 1
> >
> > Why not listing all supported statuses ? Similar to error severity below ?
> >
>
> We now only use the first bit for uncorrectable error. The correctable errors
> are handled in host and would not be delivered to QEMU.
>
> I think it's unnecessary to list all the bit masks.

I'm not sure we are using all the error severity types either, but fair enough.
>
> >> +
> >> +/*
> >> + * Values for error_severity field
> >> + */
> >> +enum AcpiGenericErrorSeverity {
> >> +ACPI_CPER_SEV_RECOVERABLE,
> >> +ACPI_CPER_SEV_FATAL,
> >> +ACPI_CPER_SEV_CORRECTED,
> >> +ACPI_CPER_SEV_NONE,
> >> +};
> >> +
> >>  /*
> >>   * Now only support ARMv8 SEA notification type error source
> >>   */
> >> @@ -49,6 +77,16 @@
> >>   */
> >>  #define ACPI_GHES_SOURCE_GENERIC_ERROR_V2   10
> >>
> >> +#define UUID_BE(a, b, c, d0, d1, d2, d3, d4, d5, d6, d7)\
> >> +{{{ ((a) >> 24) & 0xff, ((a) >> 16) & 0xff, ((a) >> 8) & 0xff, (a) & 
> >> 0xff, \
> >> +((b) >> 8) & 0xff, (b) & 0xff,   \
> >> +((c) >> 8) & 0xff, (c) & 0xff,\
> >> +(d0), (d1), (d2), (d3), (d4), (d5), (d6), (d7) } } }
> >> +
> >> +#define UEFI_CPER_SEC_PLATFORM_MEM   \
> >> +UUID_BE(0xA5BC1114, 0x6F64, 0x4EDE, 0xB8, 0x63, 0x3E, 0x83, \
> >> +0xED, 0x7C, 0x83, 0xB1)
> >> +
> >>  /*

As suggested in different thread - could this be also made common with
NVMe code ?
> >>   * | +--+ 0
> >>   * | |Header|
> >> @@ -77,6 +115,174 @@ typedef struct AcpiGhesState {
> >>  uint64_t ghes_addr_le;
> >>  } AcpiGhesState;
> >>
> >> +/*
> >> + * Total size for Generic Error Status Block
> >> + * ACPI 6.2: 18.3.2.7.1 Generic Error Data,
> >> + * Table 18-380 Generic Error Status Block
> >> + */
> >> +#define ACPI_GHES_GESB_SIZE 20
> >
> > Minor: This is not entirely correct: GEDE is part of GESB so the total 
> > length
> > would be ACPI_GHES_GESB_SIZE + n* sizeof(GEDE)

Re: [PATCH v9 1/3] block: introduce compress filter driver

2019-11-27 Thread Max Reitz

On 26.11.19 15:43, Andrey Shinkevich wrote:
> Allow writing all the data compressed through the filter driver.
> The written data will be aligned by the cluster size.
> Based on the QEMU current implementation, that data can be written to
> unallocated clusters only. May be used for a backup job.
> 
> Suggested-by: Max Reitz 
> Signed-off-by: Andrey Shinkevich 
> ---
>  block/Makefile.objs |   1 +
>  block/filter-compress.c | 190 
> 
>  qapi/block-core.json|  10 ++-
>  3 files changed, 197 insertions(+), 4 deletions(-)
>  create mode 100644 block/filter-compress.c

[...]

> diff --git a/block/filter-compress.c b/block/filter-compress.c
> new file mode 100644
> index 000..ef4b12b
> --- /dev/null
> +++ b/block/filter-compress.c

[...]

> +#define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
> +  | BLK_PERM_WRITE \
> +  | BLK_PERM_RESIZE)
> +#define PERM_UNCHANGED (BLK_PERM_ALL & ~PERM_PASSTHROUGH)
> +
> +static void compress_child_perm(BlockDriverState *bs, BdrvChild *c,
> +const BdrvChildRole *role,
> +BlockReopenQueue *reopen_queue,
> +uint64_t perm, uint64_t shared,
> +uint64_t *nperm, uint64_t *nshared)
> +{
> +*nperm = perm & PERM_PASSTHROUGH;
> +*nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
> +
> +/*
> + * We must not request write permissions for an inactive node, the child
> + * cannot provide it.
> + */
> +if (!(bs->open_flags & BDRV_O_INACTIVE)) {
> +*nperm |= BLK_PERM_WRITE_UNCHANGED;
> +}

The copy-on-read filter has to take the WRITE_UNCHANGED permission
because it will do such writes for every read, but I don’t think this
driver ever needs to take this permission.  Therefore it should be
enough to use bdrv_filter_default_perms for .bdrv_child_perm.

Max



signature.asc
Description: OpenPGP digital signature

Re: [RESEND PATCH v21 5/6] target-arm: kvm64: handle SIGBUS signal from kernel or KVM

2019-11-27 Thread Beata Michalska

On Wed, 27 Nov 2019 at 13:03, Igor Mammedov  wrote:
>
> On Wed, 27 Nov 2019 20:47:15 +0800
> Xiang Zheng  wrote:
>
> > Hi Beata,
> >
> > Thanks for you review!
> >
> > On 2019/11/22 23:47, Beata Michalska wrote:
> > > Hi,
> > >
> > > On Mon, 11 Nov 2019 at 01:48, Xiang Zheng  wrote:
> > >>
> > >> From: Dongjiu Geng 
> > >>
> > >> Add a SIGBUS signal handler. In this handler, it checks the SIGBUS type,
> > >> translates the host VA delivered by host to guest PA, then fills this PA
> > >> to guest APEI GHES memory, then notifies guest according to the SIGBUS
> > >> type.
> > >>
> > >> When guest accesses the poisoned memory, it will generate a Synchronous
> > >> External Abort(SEA). Then host kernel gets an APEI notification and calls
> > >> memory_failure() to unmapped the affected page in stage 2, finally
> > >> returns to guest.
> > >>
> > >> Guest continues to access the PG_hwpoison page, it will trap to KVM as
> > >> stage2 fault, then a SIGBUS_MCEERR_AR synchronous signal is delivered to
> > >> Qemu, Qemu records this error address into guest APEI GHES memory and
> > >> notifes guest using Synchronous-External-Abort(SEA).
> > >>
> > >> In order to inject a vSEA, we introduce the kvm_inject_arm_sea() function
> > >> in which we can setup the type of exception and the syndrome information.
> > >> When switching to guest, the target vcpu will jump to the synchronous
> > >> external abort vector table entry.
> > >>
> > >> The ESR_ELx.DFSC is set to synchronous external abort(0x10), and the
> > >> ESR_ELx.FnV is set to not valid(0x1), which will tell guest that FAR is
> > >> not valid and hold an UNKNOWN value. These values will be set to KVM
> > >> register structures through KVM_SET_ONE_REG IOCTL.
> > >>
> > >> Signed-off-by: Dongjiu Geng 
> > >> Signed-off-by: Xiang Zheng 
> > >> Reviewed-by: Michael S. Tsirkin 
> > >> ---
> [...]
> > >> diff --git a/include/hw/acpi/acpi_ghes.h b/include/hw/acpi/acpi_ghes.h
> > >> index cb62ec9c7b..8e3c5b879e 100644
> > >> --- a/include/hw/acpi/acpi_ghes.h
> > >> +++ b/include/hw/acpi/acpi_ghes.h
> > >> @@ -24,6 +24,9 @@
> > >>
> > >>  #include "hw/acpi/bios-linker-loader.h"
> > >>
> > >> +#define ACPI_GHES_CPER_OK   1
> > >> +#define ACPI_GHES_CPER_FAIL 0
> > >> +
> > >
> > > Is there really a need to introduce those ?
> > >
> >
> > Don't you think it's more clear than using "1" or "0"? :)
>
> or maybe just reuse default libc return convention: 0 - ok, -1 - fail
> and drop custom macros
>

Totally agree.

BR
Beata
> >
> > >>  /*
> > >>   * Values for Hardware Error Notification Type field
> > >>   */
> [...]
>

1 2 3 >

1 - 100 of 289 matches

Mail list logo