date:20211027

Re: [PATCH v4 3/3] multifd: Implement zerocopy write in multifd migration (multifd-zerocopy)

2021-10-27 Thread Peter Xu

On Wed, Oct 27, 2021 at 03:47:18AM -0300, Leonardo Bras Soares Passos wrote:
> On Wed, Oct 13, 2021 at 3:24 AM Peter Xu  wrote:
> >
> > On Sat, Oct 09, 2021 at 04:56:13AM -0300, Leonardo Bras wrote:
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 88f07baedd..c4890cbb54 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -724,6 +724,11 @@
> > >  #  will consume more CPU.
> > >  #  Defaults to 1. (Since 5.0)
> > >  #
> > > +# @multifd-zerocopy: Controls behavior on sending memory pages on 
> > > multifd migration.
> > > +#When true, enables a zerocopy mechanism for sending 
> > > memory
> > > +#pages, if host does support it.
> > > +#Defaults to false. (Since 6.2)
> > > +#
> >
> > Shall we keep it named "@zerocopy"?  Yes we have that dependency with 
> > multifd,
> > but it's fine to just fail the configuration if multifd not set. The thing 
> > is
> > we don't know whether that dependency will last forever, and we probably 
> > don't
> > want to introduce yet another feature bit when we can remove the 
> > dependency..
> > as we can't remove the old one to be compatible.
> 
> It makes sense not wanting to create a new future bit in the future,
> but if we just add a
> "@zerocopy' , wouldn't we need to fail every migration setup that
> don't support zerocopy?
> 
> (Thinking back, to stay as it is, it would also be necessary that I
> find a way to fail other multifd setups that don't support zerocopy,
> for v5)

Yes I think so; imho we can fail either whey applying the bit, or it's okay too
to fail at the start of migration.  Thanks,

-- 
Peter Xu

Re: [PATCH] hw/net: store timers for e1000 in vmstate

2021-10-27 Thread Pavel Dovgalyuk


On 27.10.2021 07:05, Jason Wang wrote:

On Tue, Oct 26, 2021 at 6:36 PM Pavel Dovgalyuk
 wrote:


Setting timers randomly when vmstate is loaded breaks
execution determinism.
Therefore this patch allows saving mit and autoneg timers
for e1000. It makes execution deterministic and allows
snapshotting and reverse debugging in icount mode.

Signed-off-by: Pavel Dovgalyuk 
---
  hw/net/e1000.c |   61 ++--
  1 file changed, 50 insertions(+), 11 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index a30546c5d5..2f706f7298 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -37,6 +37,7 @@
  #include "qemu/iov.h"
  #include "qemu/module.h"
  #include "qemu/range.h"
+#include "sysemu/replay.h"

  #include "e1000x_common.h"
  #include "trace.h"
@@ -1407,7 +1408,7 @@ static int e1000_pre_save(void *opaque)
   * complete auto-negotiation immediately. This allows us to look
   * at MII_SR_AUTONEG_COMPLETE to infer link status on load.
   */
-if (nc->link_down && have_autoneg(s)) {
+if (replay_mode == REPLAY_MODE_NONE && nc->link_down && have_autoneg(s)) {
  s->phy_reg[PHY_STATUS] |= MII_SR_AUTONEG_COMPLETE;
  }

@@ -1438,22 +1439,12 @@ static int e1000_post_load(void *opaque, int version_id)
  s->mac_reg[TADV] = 0;
  s->mit_irq_level = false;
  }
-s->mit_ide = 0;
-s->mit_timer_on = true;
-timer_mod(s->mit_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + 1);

  /* nc.link_down can't be migrated, so infer link_down according
   * to link status bit in mac_reg[STATUS].
   * Alternatively, restart link negotiation if it was in progress. */
  nc->link_down = (s->mac_reg[STATUS] & E1000_STATUS_LU) == 0;

-if (have_autoneg(s) &&
-!(s->phy_reg[PHY_STATUS] & MII_SR_AUTONEG_COMPLETE)) {
-nc->link_down = false;
-timer_mod(s->autoneg_timer,
-  qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 500);
-}


So we won't get those timers armed after migration unconditionally. Is
this intended?


Not really. I think there could be several solutions:
1. Save some flag to distinguish between old and new state.
2. Use deterministic version for icount (or even record/replay) mode only.
3. Check machine type to change the behavior (as Dave proposed)



Thanks


-
  s->tx.props = s->mig_props;
  if (!s->received_tx_tso) {
  /* We received only one set of offload data (tx.props)
@@ -1472,6 +1463,13 @@ static int e1000_tx_tso_post_load(void *opaque, int 
version_id)
  return 0;
  }

+static int e1000_mit_timer_post_load(void *opaque, int version_id)
+{
+E1000State *s = opaque;
+s->mit_timer_on = true;
+return 0;
+}
+
  static bool e1000_mit_state_needed(void *opaque)
  {
  E1000State *s = opaque;
@@ -1493,6 +1491,21 @@ static bool e1000_tso_state_needed(void *opaque)
  return chkflag(TSO);
  }

+static bool e1000_mit_timer_needed(void *opaque)
+{
+E1000State *s = opaque;
+
+return s->mit_timer_on;
+}
+
+static bool e1000_autoneg_timer_needed(void *opaque)
+{
+E1000State *s = opaque;
+
+return have_autoneg(s)
+   && !(s->phy_reg[PHY_STATUS] & MII_SR_AUTONEG_COMPLETE);
+}
+
  static const VMStateDescription vmstate_e1000_mit_state = {
  .name = "e1000/mit_state",
  .version_id = 1,
@@ -1541,6 +1554,30 @@ static const VMStateDescription 
vmstate_e1000_tx_tso_state = {
  }
  };

+static const VMStateDescription vmstate_e1000_mit_timer = {
+.name = "e1000/mit_timer",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = e1000_mit_timer_needed,
+.post_load = e1000_mit_timer_post_load,
+.fields = (VMStateField[]) {
+VMSTATE_TIMER_PTR(mit_timer, E1000State),
+VMSTATE_UINT32(mit_ide, E1000State),
+VMSTATE_END_OF_LIST()
+}
+};
+
+static const VMStateDescription vmstate_e1000_autoneg_timer = {
+.name = "e1000/autoneg_timer",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = e1000_autoneg_timer_needed,
+.fields = (VMStateField[]) {
+VMSTATE_TIMER_PTR(autoneg_timer, E1000State),
+VMSTATE_END_OF_LIST()
+}
+};
+
  static const VMStateDescription vmstate_e1000 = {
  .name = "e1000",
  .version_id = 2,
@@ -1622,6 +1659,8 @@ static const VMStateDescription vmstate_e1000 = {
  &vmstate_e1000_mit_state,
  &vmstate_e1000_full_mac_state,
  &vmstate_e1000_tx_tso_state,
+&vmstate_e1000_mit_timer,
+&vmstate_e1000_autoneg_timer,
  NULL
  }
  };

Re: [PATCH v2 0/3] memory: memory_region_is_mapped() cleanups

2021-10-27 Thread David Hildenbrand

On 27.10.21 05:53, Peter Xu wrote:
> On Tue, Oct 26, 2021 at 06:06:46PM +0200, David Hildenbrand wrote:
>> This is the follow-up of [1].
>>
>> Playing with memory_region_is_mapped(), I realized that memory regions
>> mapped via an alias behave a little bit "differently", as they don't have
>> their ->container set.
> 

Hi Peter,

thanks for your review!

> The patches look ok to me, though I have a few pure questions to ask..
> 
>> * memory_region_is_mapped() will never succeed for memory regions mapped
>>   via an alias
> 
> I think you mentioned that in commit message of patch 2 that it fixes no real
> problem so far, so I'm also wondering in which case it'll help.  Say, normally
> when there's an alias of another MR and we want to know whether the MR is
> mapped, we simply call memory_region_is_mapped() upon the alias .

Just to recap: in v1 I proposed to just document that it doesn't work on
aliases, and folks weren't too happy to see regions mapped via aliases
being special-cased where it might just be avoided.

> 
> To verify my thoughts, I did look up a few memory_region_is_mapped() random
> callers that used with alias and that's what they did:
> 
> Here'sthe dino.c example:
> 
> *** hw/hppa/dino.c:
> gsc_to_pci_forwarding[151] if (!memory_region_is_mapped(mem)) {
> gsc_to_pci_forwarding[155] } else if (memory_region_is_mapped(mem)) {
> 
> The "mem" points to:
> 
> MemoryRegion *mem = &s->pci_mem_alias[i];
> 
> Which is the alias.
> 
> Another one:
> 
> *** hw/pci-host/pnv_phb3.c:
> pnv_phb3_check_m32[121]if (memory_region_is_mapped(&phb->mr_m32)) {
> pnv_phb3_update_regions[1076]  if (memory_region_is_mapped(&phb->mr_m32)) {
> 
> Andmr_m32 is the alias MR itself:
> 
> memory_region_init_alias(&phb->mr_m32, OBJECT(phb), "phb3-m32",
>  &phb->pci_mmio, start, size);
> 
> I mean, if it should always be very straightforward to fetch the alias mr, 
> then
> I'm just afraid patch 2 won't really help in any real use case but pure 
> overhead.

That is true as long as it's not a mapping check in generic code. Say,
we have a RAMBlock and use ->mr. Checking
memory_region_is_mapped(rb->mr) is misleading if the region is mapped
via aliases.

The reason I stumbled over this at all is a sanity check I added in

void memory_region_set_ram_discard_manager(MemoryRegion *mr,
   RamDiscardManager *rdm)
{
g_assert(memory_region_is_ram(mr) && !memory_region_is_mapped(mr));
g_assert(!rdm || !mr->rdm);
mr->rdm = rdm;
}

If mr is only mapped via aliases (see the virtio-mem memslot series),
this check is of no value, because even if the mr would be mapped via
aliases, we would not be able to catch it.

Having that said, the check is not 100% correct, because
memory_region_is_mapped() does not indicate that we're actually mapped
into an address space. But at least for memory devices (-> target use
case of RamDiscardManager) with an underlying RAMBlock, it's pretty
reliable -- and there is no easy way to check if we're mapped into an
address space when aliases are involved.

Note that there is also a similar check in memory_region_is_mapped(),
but I'm removing that as part of the virtio-mem memslot series, because
it's not actually helpful in the context of RAMBlock migration (nothing
might be mapped, but migration code would still try migrating such a
RAMBlock and has to consider the RamDiscardManager).

Another example of a generic code check is patch #1: the only reason it
works right now is because NVDIMMs cannot exist before initial memory is
created. But yes, there is a better way of doing it when we have a
memdev at hand.

> 
> And I hope we won't trigger problem with any use case where
> memory_region_is_mapped() returned false previously but then it'll return true
> after patch 2, because logically with the old code one can detect explicitly 
> on
> "whether this original MR is mapped somewhere, irrelevant of other alias
> mappings upon this mr".  Patch 2 blurrs it from that pov.
> 

I hope tests will catch that early. I ran some without surprises.

>> * memory_region_to_address_space(), memory_region_find(),
>>   memory_region_find_rcu(), memory_region_present() won't work, which seems
>>   okay, because we don't expect such memory regions getting passed to these
>>   functions.
> 
> Looks right.
> 
>> * memory_region_to_absolute_addr() will result in a wrong address. As
>>   the result is only used for tracing, that is tolerable.
> 
> memory_region_{read|write}_accessor() seem to be only called from the address
> space layer, so it looks fine even for tracing as it'll always fetch the 
> alias,
> afaiu.  Phil's patch may change that fact, though, it seems.

Unfortunately, not much we can do: a memory region might theoretically
be mapped via aliases into different address spaces and into different
locations: there is no right answer to memory_region_to_absolute_addr()
when only having the memory region at ha

Re: [PATCH V4 1/3] net/filter: Remove vnet_hdr from filter-mirror and filter-redirector

2021-10-27 Thread Markus Armbruster

Jason Wang  writes:

> On Wed, Oct 27, 2021 at 2:40 PM Zhang, Chen  wrote:

[...]

>> > >> I wonder if we break management layer. If yes, maybe it's better to
>> > >> keep the vnet_hdr_support here.
>> > >
>> > > Yes and no,   With this series of patches, filters have ability to 
>> > > automatically
>> > > Configure the appropriate vnet_hdr_support flag according to the current 
>> > > environment.
>> > > And can report error when can't fixing the vnet_hdr(The user cannot fix 
>> > > it from the previous way ).
>> > > So I think no need for the user to configure this option, some relevant 
>> > > background knowledge required.
>> > >
>> > > For the management layer, keep the vnet_hdr_support may be meaningless 
>> > > except for compatibility.
>> > > In this situation, Do you think we still need to keep the 
>> > > vnet_hdr_support for management layer?
>> >
>> >
>> > So it depends on whether management layer like libvirt has already
>> > supported this. If yes, we may get errors using new qemu with old libvirt?
>>
>> As far as I know, Current management layer like upstream libvirt is no COLO 
>> official support yet.
>> And some real CSPs use libvirt passthrough qmp command to Qemu for manage 
>> COLO VM.
>
> So the question still, it looks to me it requires the modification of
> the layers on top of libvirt? If the answer is yes, we'd better keep
> that compatibility.

When in doubt, maintain compatibility.

We may want to deprecate parameters that have become unnecessary.

[...]

Re: [PATCH v7 0/4] VNC-related HMP/QMP fixes

2021-10-27 Thread Gerd Hoffmann

On Tue, Oct 26, 2021 at 01:32:29PM +0200, Markus Armbruster wrote:
> "Dr. David Alan Gilbert"  writes:
> 
> > * Markus Armbruster (arm...@redhat.com) wrote:
> >> I'm done reviewing.  David, care to have another look at the HMP part?
> >
> > Yep, looking good to me - is that going via qmp, hmp, or vnc ?
> 
> Either is fine with me.
> 
> David, Gerd, do you have anything queued up already?

Nothing queued atm, no objections to someone else picking this up.

Acked-by: Gerd Hoffmann 

take care,
  Gerd

Re: [PATCH v3 1/3] smbios: Rename SMBIOS_ENTRY_POINT_* enums

2021-10-27 Thread Michael S. Tsirkin

On Tue, Oct 26, 2021 at 11:10:58AM -0400, Eduardo Habkost wrote:
> Rename the enums to match the naming style used by QAPI, and to
> use "32" and "64" instead of "20" and "31".  This will allow us
> to more easily move the enum to the QAPI schema later.
> 
> About the naming choice: "SMBIOS 2.1 entry point"/"SMBIO 3.0

typo in commit log

> entry point" and "32-bit entry point"/"64-bit entry point" are
> synonymous in the SMBIOS specification.  However, the phrases
> "32-bit entry point" and "64-bit entry point" are used more often.
> 
> The new names also avoid confusion between the entry point format
> and the actual SMBIOS version reported in the entry point
> structure.  For example: currently the 32-bit entry point
> actually report SMBIOS 2.8 support, not 2.1.
> 
> Based on portions of a patch submitted by Daniel P. Berrangé.

I think you need the original S.O.B here too then.

> 
> Signed-off-by: Eduardo Habkost 
> ---
> First version of this code was submitted at:
> https://lore.kernel.org/qemu-devel/20200908165438.1008942-5-berra...@redhat.com
> 
> Changes from v2:
> * Use "32" and "64" instead of "2_0" and "3_1"
> 
> Changes from v1:
> * Patch was split in two
> * Hunks included this patch are not changed from v1
> ---
>  include/hw/firmware/smbios.h | 4 ++--
>  hw/arm/virt.c| 2 +-
>  hw/i386/pc_piix.c| 2 +-
>  hw/i386/pc_q35.c | 2 +-
>  hw/smbios/smbios.c   | 8 
>  5 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/include/hw/firmware/smbios.h b/include/hw/firmware/smbios.h
> index 5a0dd0c8cff..d916baed6a9 100644
> --- a/include/hw/firmware/smbios.h
> +++ b/include/hw/firmware/smbios.h
> @@ -27,8 +27,8 @@ struct smbios_phys_mem_area {
>   * SMBIOS spec defined tables
>   */
>  typedef enum SmbiosEntryPointType {
> -SMBIOS_ENTRY_POINT_21,
> -SMBIOS_ENTRY_POINT_30,
> +SMBIOS_ENTRY_POINT_TYPE_32,
> +SMBIOS_ENTRY_POINT_TYPE_64,
>  } SmbiosEntryPointType;
>  
>  /* SMBIOS Entry Point
> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> index ca433adb5b1..2bd73d501da 100644
> --- a/hw/arm/virt.c
> +++ b/hw/arm/virt.c
> @@ -1589,7 +1589,7 @@ static void virt_build_smbios(VirtMachineState *vms)
>  
>  smbios_set_defaults("QEMU", product,
>  vmc->smbios_old_sys_ver ? "1.0" : mc->name, false,
> -true, SMBIOS_ENTRY_POINT_30);
> +true, SMBIOS_ENTRY_POINT_TYPE_64);
>  
>  smbios_get_tables(MACHINE(vms), NULL, 0,
>&smbios_tables, &smbios_tables_len,
> diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
> index 6ad0d763c57..17c050694f5 100644
> --- a/hw/i386/pc_piix.c
> +++ b/hw/i386/pc_piix.c
> @@ -177,7 +177,7 @@ static void pc_init1(MachineState *machine,
>  smbios_set_defaults("QEMU", "Standard PC (i440FX + PIIX, 1996)",
>  mc->name, pcmc->smbios_legacy_mode,
>  pcmc->smbios_uuid_encoded,
> -SMBIOS_ENTRY_POINT_21);
> +SMBIOS_ENTRY_POINT_TYPE_32);
>  }
>  
>  /* allocate ram and load rom/bios */
> diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
> index fcc6e4eb2b8..48419ebfd5f 100644
> --- a/hw/i386/pc_q35.c
> +++ b/hw/i386/pc_q35.c
> @@ -199,7 +199,7 @@ static void pc_q35_init(MachineState *machine)
>  smbios_set_defaults("QEMU", "Standard PC (Q35 + ICH9, 2009)",
>  mc->name, pcmc->smbios_legacy_mode,
>  pcmc->smbios_uuid_encoded,
> -SMBIOS_ENTRY_POINT_21);
> +SMBIOS_ENTRY_POINT_TYPE_32);
>  }
>  
>  /* allocate ram and load rom/bios */
> diff --git a/hw/smbios/smbios.c b/hw/smbios/smbios.c
> index 7397e567373..6013df1698e 100644
> --- a/hw/smbios/smbios.c
> +++ b/hw/smbios/smbios.c
> @@ -62,7 +62,7 @@ uint8_t *smbios_tables;
>  size_t smbios_tables_len;
>  unsigned smbios_table_max;
>  unsigned smbios_table_cnt;
> -static SmbiosEntryPointType smbios_ep_type = SMBIOS_ENTRY_POINT_21;
> +static SmbiosEntryPointType smbios_ep_type = SMBIOS_ENTRY_POINT_TYPE_32;
>  
>  static SmbiosEntryPoint ep;
>  
> @@ -432,7 +432,7 @@ static void smbios_validate_table(MachineState *ms)
>  exit(1);
>  }
>  
> -if (smbios_ep_type == SMBIOS_ENTRY_POINT_21 &&
> +if (smbios_ep_type == SMBIOS_ENTRY_POINT_TYPE_32 &&
>  smbios_tables_len > SMBIOS_21_MAX_TABLES_LEN) {
>  error_report("SMBIOS 2.1 table length %zu exceeds %d",
>   smbios_tables_len, SMBIOS_21_MAX_TABLES_LEN);
> @@ -927,7 +927,7 @@ void smbios_set_defaults(const char *manufacturer, const 
> char *product,
>  static void smbios_entry_point_setup(void)
>  {
>  switch (smbios_ep_type) {
> -case SMBIOS_ENTRY_POINT_21:
> +case SMBIOS_ENTRY_POINT_TYPE_32:
>  memcpy(ep.ep21.anchor_string, "_SM_", 4);
>  memcpy(ep.ep21.intermediate_anchor_string, "_DMI_", 5);
>

[PATCH] qtest: fix 'expression is always false' build failure in qtest_has_accel()

2021-10-27 Thread Igor Mammedov

If KVM is disabled or not present, qtest library build
may fail with:
   libqtest.c: In function 'qtest_has_accel':
  comparison of unsigned expression < 0 is always false
  [-Werror=type-limits]
 for (i = 0; i < ARRAY_SIZE(targets); i++) {

due to empty 'targets' array.
Fix it by compiling KVM related part only if
CONFIG_KVM_TARGETS is set.

Fixes: e741aff0f43343 ("tests: qtest: add qtest_has_accel() to check if tested 
binary supports accelerator")
Reported-by: Jason Andryuk 
Signed-off-by: Igor Mammedov 
---
 tests/qtest/libqtest.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 25aeea385b..9833e16f84 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -931,6 +931,7 @@ bool qtest_has_accel(const char *accel_name)
 return false;
 #endif
 } else if (g_str_equal(accel_name, "kvm")) {
+#if defined(CONFIG_KVM_TARGETS)
 int i;
 const char *arch = qtest_get_arch();
 const char *targets[] = { CONFIG_KVM_TARGETS };
@@ -942,6 +943,9 @@ bool qtest_has_accel(const char *accel_name)
 }
 }
 }
+#else
+return false;
+#endif
 } else {
 /* not implemented */
 g_assert_not_reached();
-- 
2.27.0

Re: [PATCH v4 6/8] iotests/300: avoid abnormal shutdown race condition

2021-10-27 Thread Hanna Reitz


On 26.10.21 19:07, John Snow wrote:



On Mon, Oct 25, 2021 at 9:20 AM Hanna Reitz  wrote:

On 13.10.21 23:57, John Snow wrote:
> Wait for the destination VM to close itself instead of racing to
shut it
> down first, which produces different error log messages from AQMP
> depending on precisely when we tried to shut it down.
>
> (For example: We may try to issue 'quit' immediately prior to
the target
> VM closing its QMP socket, which will cause an ECONNRESET error
to be
> logged. Waiting for the VM to exit itself avoids the race on
shutdown
> behavior.)
>
> Reported-by: Hanna Reitz 
> Signed-off-by: John Snow 
> ---
>   tests/qemu-iotests/300 | 12 
>   1 file changed, 4 insertions(+), 8 deletions(-)
>
> diff --git a/tests/qemu-iotests/300 b/tests/qemu-iotests/300
> index 10f9f2a8da6..bbea7248005 100755
> --- a/tests/qemu-iotests/300
> +++ b/tests/qemu-iotests/300
> @@ -24,8 +24,6 @@ import random
>   import re
>   from typing import Dict, List, Optional
>
> -from qemu.machine import machine
> -
>   import iotests
>
>
> @@ -461,12 +459,10 @@ class
TestBlockBitmapMappingErrors(TestDirtyBitmapMigration):
>                         f"'{self.src_node_name}': Name is longer
than 255 bytes",
>                         log)
>
> -        # Expect abnormal shutdown of the destination VM because of
> -        # the failed migration
> -        try:
> -            self.vm_b.shutdown()
> -        except machine.AbnormalShutdown:
> -            pass
> +        # Destination VM will terminate w/ error of its own accord
> +        # due to the failed migration.
> +        self.vm_b.wait()
> +        assert self.vm_b.exitcode() > 0

Trying to test, I can see that this fails iotest 297, because
`.exitcode()` is `Optional[int]`...

(I can’t believe how long it took me to figure this out – the message
“300:465: Unsupported operand types for < ("int" and "None")” made me
believe that it was 300 that was failing, because `exitcode()` was
returning `None` for some inconceivable reason.  I couldn’t
understand
why my usual test setup failed on every run, but I couldn’t get
300 to
fail manually...  Until I noticed that the message came below the
“297”
line, not the “300” line...)


Oops. Is there anything we can do to improve the visual clarity there?


You mean, like, separating iotests from the Python linting tests? :)

I think until then I’ll just need to pay more attention.

Hanna

Re: [PATCH] failover: specify an alternate MAC address

2021-10-27 Thread Michael S. Tsirkin

On Tue, Oct 26, 2021 at 05:03:05PM +0200, Laurent Vivier wrote:
> If the guest driver doesn't support the STANDBY feature, by default
> we keep the virtio-net device and don't hotplug the VFIO device,
> but in some cases, user can prefer to use the VFIO device rather
> than the virtio-net one. We can't unplug the virtio-net device
> (because on migration it is expected on the destination side) but
> we can keep both interfaces if the MAC addresses are different
> (to have the same MAC address can cause kernel crash with old
> kernel). The VFIO device will be unplugged before the migration
> like in the normal failover migration but without a failover device.
> 
> This patch adds a new property to the virtio-net device: "alt-mac"

I'd rather give it some kind of name that reflects that
it's related to failover, and to legacy guests not
supporting STANDBY.  failover-legacy-mac ?

> 
> If an alternate MAC address is provided with "alt-mac" and the
> STANDBY feature is not supported, both interfaces are plugged
> but the standby interface (virtio-net) MAC address is set to the
> value provided by the "alt-mac" parameter.
> 
> If the STANDBY feature is supported by guest and QEMU, the virtio-net
> failover acts as usual.
> 
> Signed-off-by: Laurent Vivier 
> ---
>  docs/system/virtio-net-failover.rst | 17 ++
>  hw/net/virtio-net.c | 48 +++--
>  include/hw/virtio/virtio-net.h  |  6 
>  3 files changed, 62 insertions(+), 9 deletions(-)
> 
> diff --git a/docs/system/virtio-net-failover.rst 
> b/docs/system/virtio-net-failover.rst
> index 6002dc5d96e4..0b1699169ef6 100644
> --- a/docs/system/virtio-net-failover.rst
> +++ b/docs/system/virtio-net-failover.rst
> @@ -51,6 +51,23 @@ Usage
>is only for pairing the devices within QEMU. The guest kernel module
>net_failover will match devices with identical MAC addresses.
>  
> +  The VIRTIO_NET_F_STANDBY must be supported by both sides, QEMU and guest
> +  kernel, to allow the guest kernel module to pair the devices.
> +  If the kernel module doesn't support the feature, only the standby device
> +  (virtio-net) is plugged, this allows to do a live migration without any
> +  connection loss.
> +
> +  In this case, You can force QEMU to keep both interfaces by providing an

you ?

> +  alternate MAC address to the virtio-net device with the parameter alt-mac.
> +  Both interfaces will be available separately:
> +
> +  -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:6f:55:cc, \
> +bus=root2,failover=on,alt-mac=52:54:00:12:34:56
> +
> +  As in the failover case, the (vfio)-pci device will be unplugged
> +  automatically before the migration and plugged back on the destination side
> +  after the migration.

That's a weird way to explain this. I'd rather the documentation
just listed both options in a straight forward way.
And there's no need to repeat here how feature is used etc.

E.g. along the lines of

For legacy guests (including bios/EUFI) not supporting
VIRTIO_NET_F_STANDBY, two options exist:

1. if alt-mac has not been configured (default)
only the standby virtio-net device is visible to the guest

2. if alt-mac has been configured, virtio and vfio devices will be
presented to guest as two NIC devices, with virtio using
the alt-mac address.




> +
>  Hotplug
>  ---
>  
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index f2014d5ea0b3..7aba2ac78f87 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -45,6 +45,9 @@
>  #include "net_rx_pkt.h"
>  #include "hw/virtio/vhost.h"
>  
> +/* zero MAC address to check with */
> +static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
> +
>  #define VIRTIO_NET_VM_VERSION11
>  
>  #define MAC_TABLE_ENTRIES64
> @@ -126,7 +129,6 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
> uint8_t *config)
>  VirtIONet *n = VIRTIO_NET(vdev);
>  struct virtio_net_config netcfg;
>  NetClientState *nc = qemu_get_queue(n->nic);
> -static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
>  
>  int ret = 0;
>  memset(&netcfg, 0 , sizeof(struct virtio_net_config));
> @@ -871,10 +873,21 @@ static void failover_add_primary(VirtIONet *n, Error 
> **errp)
>  error_propagate(errp, err);
>  }
>  
> +static void failover_plug_primary(VirtIONet *n)
> +{
> +Error *err = NULL;
> +
> +qapi_event_send_failover_negotiated(n->netclient_name);
> +qatomic_set(&n->failover_primary_hidden, false);
> +failover_add_primary(n, &err);
> +if (err) {
> +warn_report_err(err);
> +}
> +}
> +
>  static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
>  {
>  VirtIONet *n = VIRTIO_NET(vdev);
> -Error *err = NULL;
>  int i;
>  
>  if (n->mtu_bypass_backend &&
> @@ -921,12 +934,22 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint64_t features)
>  memset(n->vlans, 0

Re: [PATCH v2 0/3] memory: memory_region_is_mapped() cleanups

2021-10-27 Thread Peter Xu

On Wed, Oct 27, 2021 at 09:12:08AM +0200, David Hildenbrand wrote:
> On 27.10.21 05:53, Peter Xu wrote:
> > On Tue, Oct 26, 2021 at 06:06:46PM +0200, David Hildenbrand wrote:
> >> This is the follow-up of [1].
> >>
> >> Playing with memory_region_is_mapped(), I realized that memory regions
> >> mapped via an alias behave a little bit "differently", as they don't have
> >> their ->container set.
> > 
> 
> Hi Peter,
> 
> thanks for your review!
> 
> > The patches look ok to me, though I have a few pure questions to ask..
> > 
> >> * memory_region_is_mapped() will never succeed for memory regions mapped
> >>   via an alias
> > 
> > I think you mentioned that in commit message of patch 2 that it fixes no 
> > real
> > problem so far, so I'm also wondering in which case it'll help.  Say, 
> > normally
> > when there's an alias of another MR and we want to know whether the MR is
> > mapped, we simply call memory_region_is_mapped() upon the alias .
> 
> Just to recap: in v1 I proposed to just document that it doesn't work on
> aliases, and folks weren't too happy to see regions mapped via aliases
> being special-cased where it might just be avoided.
> 
> > 
> > To verify my thoughts, I did look up a few memory_region_is_mapped() random
> > callers that used with alias and that's what they did:
> > 
> > Here'sthe dino.c example:
> > 
> > *** hw/hppa/dino.c:
> > gsc_to_pci_forwarding[151] if (!memory_region_is_mapped(mem)) {
> > gsc_to_pci_forwarding[155] } else if (memory_region_is_mapped(mem)) {
> > 
> > The "mem" points to:
> > 
> > MemoryRegion *mem = &s->pci_mem_alias[i];
> > 
> > Which is the alias.
> > 
> > Another one:
> > 
> > *** hw/pci-host/pnv_phb3.c:
> > pnv_phb3_check_m32[121]if (memory_region_is_mapped(&phb->mr_m32)) {
> > pnv_phb3_update_regions[1076]  if (memory_region_is_mapped(&phb->mr_m32)) {
> > 
> > Andmr_m32 is the alias MR itself:
> > 
> > memory_region_init_alias(&phb->mr_m32, OBJECT(phb), "phb3-m32",
> >  &phb->pci_mmio, start, size);
> > 
> > I mean, if it should always be very straightforward to fetch the alias mr, 
> > then
> > I'm just afraid patch 2 won't really help in any real use case but pure 
> > overhead.
> 
> That is true as long as it's not a mapping check in generic code. Say,
> we have a RAMBlock and use ->mr. Checking
> memory_region_is_mapped(rb->mr) is misleading if the region is mapped
> via aliases.
> 
> The reason I stumbled over this at all is a sanity check I added in
> 
> void memory_region_set_ram_discard_manager(MemoryRegion *mr,
>RamDiscardManager *rdm)
> {
> g_assert(memory_region_is_ram(mr) && !memory_region_is_mapped(mr));
> g_assert(!rdm || !mr->rdm);
> mr->rdm = rdm;
> }
> 
> If mr is only mapped via aliases (see the virtio-mem memslot series),
> this check is of no value, because even if the mr would be mapped via
> aliases, we would not be able to catch it.

Yeah, this is a sane check to ask for.

> 
> Having that said, the check is not 100% correct, because
> memory_region_is_mapped() does not indicate that we're actually mapped
> into an address space. But at least for memory devices (-> target use
> case of RamDiscardManager) with an underlying RAMBlock, it's pretty
> reliable -- and there is no easy way to check if we're mapped into an
> address space when aliases are involved.
> 
> Note that there is also a similar check in memory_region_is_mapped(),
> but I'm removing that as part of the virtio-mem memslot series, because
> it's not actually helpful in the context of RAMBlock migration (nothing
> might be mapped, but migration code would still try migrating such a
> RAMBlock and has to consider the RamDiscardManager).
> 
> 
> Another example of a generic code check is patch #1: the only reason it
> works right now is because NVDIMMs cannot exist before initial memory is
> created. But yes, there is a better way of doing it when we have a
> memdev at hand.

IMHO patch 1 is actually an example showing that when we want that explicit
meaning we can simply introduce another boolean in parent struct. :)

> 
> > 
> > And I hope we won't trigger problem with any use case where
> > memory_region_is_mapped() returned false previously but then it'll return 
> > true
> > after patch 2, because logically with the old code one can detect 
> > explicitly on
> > "whether this original MR is mapped somewhere, irrelevant of other alias
> > mappings upon this mr".  Patch 2 blurrs it from that pov.
> > 
> 
> I hope tests will catch that early. I ran some without surprises.
> 
> >> * memory_region_to_address_space(), memory_region_find(),
> >>   memory_region_find_rcu(), memory_region_present() won't work, which seems
> >>   okay, because we don't expect such memory regions getting passed to these
> >>   functions.
> > 
> > Looks right.
> > 
> >> * memory_region_to_absolute_addr() will result in a wrong address. As
> >>   the result is only used for tracing, t

Re: [PATCH] MAINTAINERS: Split HPPA TCG vs HPPA machines/hardware

2021-10-27 Thread Laurent Vivier


Le 27/10/2021 à 06:15, Philippe Mathieu-Daudé a écrit :

Cc'ing qemu-trivial@ (patch reviewed)

On 10/4/21 10:38, Philippe Mathieu-Daudé wrote:

Hardware emulated models don't belong to the TCG MAINTAINERS
section. Move them to the 'HP-PARISC Machines' section.

Signed-off-by: Philippe Mathieu-Daudé 
---
  MAINTAINERS | 5 ++---
  1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50435b8d2f5..002620c6cad 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -205,10 +205,7 @@ HPPA (PA-RISC) TCG CPUs
  M: Richard Henderson 
  S: Maintained
  F: target/hppa/
-F: hw/hppa/
  F: disas/hppa.c
-F: hw/net/*i82596*
-F: include/hw/net/lasi_82596.h
  
  M68K TCG CPUs

  M: Laurent Vivier 
@@ -1098,6 +1095,8 @@ R: Helge Deller 
  S: Odd Fixes
  F: configs/devices/hppa-softmmu/default.mak
  F: hw/hppa/
+F: hw/net/*i82596*
+F: include/hw/net/lasi_82596.h
  F: pc-bios/hppa-firmware.img
  
  M68K Machines






Applied to my trivial-patches branch.

Thanks,
Laurent

Re: [PATCH 8/8] x86-iommu: Fail early if vIOMMU specified after vfio-pci

2021-10-27 Thread Peter Xu

On Tue, Oct 26, 2021 at 05:11:39PM +0200, Igor Mammedov wrote:
> On Fri, 22 Oct 2021 10:14:29 +0800
> Peter Xu  wrote:
> 
> > Hi, Alex,
> > 
> > On Thu, Oct 21, 2021 at 04:30:39PM -0600, Alex Williamson wrote:
> > > On Thu, 21 Oct 2021 18:42:59 +0800
> > > Peter Xu  wrote:
> > >   
> > > > Scan the pci bus to make sure there's no vfio-pci device attached 
> > > > before vIOMMU
> > > > is realized.  
> > > 
> > > Sorry, I'm not onboard with this solution at all.
> > > 
> > > It would be really useful though if this commit log or a code comment
> > > described exactly the incompatibility for which vfio-pci devices are
> > > being called out here.  Otherwise I see this as a bit of magic voodoo
> > > that gets lost in lore and copied elsewhere and we're constantly trying
> > > to figure out specific incompatibilities when vfio-pci devices are
> > > trying really hard to be "just another device".  
> > 
> > Sure, I can enrich the commit message.
> > 
> > > 
> > > I infer from the link of the previous alternate solution that this is
> > > to do with the fact that vfio devices attach a memory listener to the
> > > device address space.  
> > 
> > IMHO it's not about the memory listeners, I think that' after vfio detected
> > some vIOMMU memory regions already, which must be based on an vIOMMU address
> > space being available.  I think the problem is that when realize() vfio-pci 
> > we
> > fetch the dma address space specifically for getting the vfio group, while 
> > that
> > could happen too early, even before vIOMMU is created.
> > 
> > > Interestingly that previous cover letter also discusses how vdpa devices
> > > might have a similar issue, which makes it confusing again that we're 
> > > calling
> > > out vfio-pci devices by name rather than for a behavior.  
> > 
> > Yes I'll need to see whether this approach will be accepted first.  I think
> > similar thing could help VDPA but it's not required there because VDPA has
> > already worked around using pci_device_iommu_address_space().  So 
> > potentially
> > the only one to "fix" is the vfio-pci device using along with vIOMMU, when 
> > the
> > device ordering is specified in the wrong order.  I'll leave the VDPA 
> > problem
> > to Jason to see whether he prefers keeping current code, or switch to a 
> > simpler
> > one.  That should be after this one.
> > 
> > > 
> > > If the behavior here is that vfio-pci devices attach a listener to the
> > > device address space, then that provides a couple possible options.  We
> > > could look for devices that have recorded an interest in their address
> > > space, such as by setting a flag on PCIDevice when someone calls
> > > pci_device_iommu_address_space(), where we could walk all devices using
> > > the code in this series to find a device with such a flag.  
> > 
> > Right, we can set a flag for all the pci devices that needs to consolidate
> > pci_device_iommu_address_space() result, however then it'll be vfio-pci 
> > only so
> > far.  Btw, I actually proposed similar things two months ago, and I think 
> > Igor
> > showed concern on that flag being vague on meaning:
> 
> (1)
> > https://lore.kernel.org/qemu-devel/20210906104915.7dd5c...@redhat.com/
> 
> > 
> >   > > Does it need to be a pre_plug hook?  I thought we might just need a 
> > flag in the
> >   > > pci device classes showing that it should be after vIOMMUs, then in 
> > vIOMMU
> >   > > realize functions we walk pci bus to make sure no such device exist?
> >   > > 
> >   > > We could have a base vIOMMU class, then that could be in the 
> > realize() of the
> >   > > common class.  
> >   > 
> >   > We basically don't know if device needs IOMMU or not and can work
> >   > with/without it just fine. In this case I'd think about IOMMU as board
> >   > feature that morphs PCI buses (some of them) (address space, bus 
> > numers, ...).
> >   > So I don't perceive any iommu flag as a device property at all.
> >   > 
> >   > As for realize vs pre_plug, the later is the part of abstract realize
> >   > (see: device_set_realized) and is already used by some PCI 
> > infrastructure:
> >   >   ex: pcie_cap_slot_pre_plug_cb/spapr_pci_pre_plug  
> > 
> > I still think that flag will work, that flag should only shows "whether this
> > device needs to be specified earlier than vIOMMU", but I can get the point 
> > from
> > Igor that it's at least confusing on what does the flag mean.
> 
> > Meanwhile I
> > don't think that flag will be required, as this is not the first time we 
> > name a
> > special device in the code, e.g. pc_machine_device_pre_plug_cb().
> > intel_iommu.c has it too upon vfio-pci already on making sure 
> > caching-mode=on
> > in vtd_machine_done_notify_one().
> 
> I pointed to specifically to _pre_plug() handler and not as
> implemented here in realize().
> Reasoning behind it is that some_device_realize() should not poke
> into other devices, while pc_machine_device_pre_plug_cb() is
> part of machine code can/may legitimately access its

Re: [PATCH] qtest: fix 'expression is always false' build failure in qtest_has_accel()

2021-10-27 Thread Michael S. Tsirkin

On Wed, Oct 27, 2021 at 03:45:42AM -0400, Igor Mammedov wrote:
> If KVM is disabled or not present, qtest library build
> may fail with:
>libqtest.c: In function 'qtest_has_accel':
>   comparison of unsigned expression < 0 is always false
>   [-Werror=type-limits]
>  for (i = 0; i < ARRAY_SIZE(targets); i++) {
> 
> due to empty 'targets' array.
> Fix it by compiling KVM related part only if
> CONFIG_KVM_TARGETS is set.
> 
> Fixes: e741aff0f43343 ("tests: qtest: add qtest_has_accel() to check if 
> tested binary supports accelerator")
> Reported-by: Jason Andryuk 
> Signed-off-by: Igor Mammedov 


> ---
>  tests/qtest/libqtest.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 25aeea385b..9833e16f84 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -931,6 +931,7 @@ bool qtest_has_accel(const char *accel_name)
>  return false;
>  #endif
>  } else if (g_str_equal(accel_name, "kvm")) {
> +#if defined(CONFIG_KVM_TARGETS)

Ugh.
const char *targets[] = { CONFIG_KVM_TARGETS };

are you use CONFIG_KVM_TARGETS is not defined?

Looks like it's defined, just empty.

which is not standard C BTW.

How about we just add an empty string in meson?

diff --git a/meson.build b/meson.build
index 2c5b53cbe2..ff85e9c2af 100644
--- a/meson.build
+++ b/meson.build
@@ -75,7 +75,7 @@ else
   kvm_targets = []
 endif
 
-kvm_targets_c = ''
+kvm_targets_c = '""'
 if not get_option('kvm').disabled() and targetos == 'linux'
   kvm_targets_c = '"' + '" ,"'.join(kvm_targets) + '"'
 endif



>  int i;
>  const char *arch = qtest_get_arch();
>  const char *targets[] = { CONFIG_KVM_TARGETS };
> @@ -942,6 +943,9 @@ bool qtest_has_accel(const char *accel_name)
>  }
>  }
>  }
> +#else
> +return false;
> +#endif
>  } else {
>  /* not implemented */
>  g_assert_not_reached();
> -- 
> 2.27.0

Re: [PATCH v2 1/4] qemu-img: implement compare --stat

2021-10-27 Thread Hanna Reitz


On 26.10.21 11:18, Vladimir Sementsov-Ogievskiy wrote:

26.10.2021 11:47, Hanna Reitz wrote:

On 26.10.21 09:53, Vladimir Sementsov-Ogievskiy wrote:

25.10.2021 19:40, Hanna Reitz wrote:

On 21.10.21 12:12, Vladimir Sementsov-Ogievskiy wrote:

With new option qemu-img compare will not stop at first mismatch, but
instead calculate statistics: how many clusters with different data,
how many clusters with equal data, how many clusters were unallocated
but become data and so on.

We compare images chunk by chunk. Chunk size depends on what
block_status returns for both images. It may return less than cluster
(remember about qcow2 subclusters), it may return more than 
cluster (if

several consecutive clusters share same status). Finally images may
have different cluster sizes. This all leads to ambiguity in how to
finally compare the data.

What we can say for sure is that, when we compare two qcow2 images 
with

same cluster size, we should compare clusters with data separately.
Otherwise, if we for example compare 10 consecutive clusters of data
where only one byte differs we'll report 10 different clusters.
Expected result in this case is 1 different cluster and 9 equal ones.

So, to serve this case and just to have some defined rule let's do 
the

following:

1. Select some block-size for compare procedure. In this commit it 
must
    be specified by user, next commit will add some automatic 
logic and

    make --block-size optional.

2. Go chunk-by-chunk using block_status as we do now with only one
    differency:
    If block_status() returns DATA region that intersects block-size
    aligned boundary, crop this region at this boundary.

This way it's still possible to compare less than cluster and report
subcluster-level accuracy, but we newer compare more than one cluster
of data.

Signed-off-by: Vladimir Sementsov-Ogievskiy 


---
  docs/tools/qemu-img.rst |  18 +++-
  qemu-img.c  | 206 
+---

  qemu-img-cmds.hx    |   4 +-
  3 files changed, 212 insertions(+), 16 deletions(-)


Looks good to me overall!  Just some technical comments below.


diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index d58980aef8..21164253d4 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -159,6 +159,18 @@ Parameters to compare subcommand:
    Strict mode - fail on different image size or sector allocation
+.. option:: --stat
+
+  Instead of exit on first mismatch compare the whole images and 
print
+  statistics on amount of different pairs of clusters, based on 
their

+  block-status and are they equal or not.


I’d phrase this as:

Instead of exiting on the first mismatch, compare the whole images 
and print statistics on how much they differ in terms of block 
status (i.e. are blocks allocated or not, do they contain data, are 
they marked as containing only zeroes) and block content (a block 
of data that contains only zero still has the same content as a 
marked-zero block).


For me the rest starting from "and block content" sounds unclear, 
seems doesn't add any information to previous (i.e. are blocks 
allocated ...)


By “block content” I meant what you said by “equal or not”, i.e. what 
is returned when reading from the block.


Now that I think about it again, I believe we should go with your 
original “equal or not”, though, because that reflects what qemu-img 
--stat prints, like so perhaps:


Instead of exiting on the first mismatch, compare the whole images 
and print statistics on the amount of different pairs of blocks, 
based on their block status and whether they are equal or not.


OK



I’d still like to add something like what I had in parentheses, 
though, because as a user, I’d find the “block status” and “equal or 
not” terms to be a bit handwave-y.  I don’t think “block status” is a 
common term in our documentation, so I wanted to add some examples; 
and I wanted to show by example that “equal” blocks don’t need to 
have the same block status.


Actually, I think, that the resulting tool is anyway 
developer-oriented, to use it you should know what this block status 
really means.. May be just s/block status/block type/, will it be more 
human friendly?


Oh, OK.  I think I’d prefer “block status” still, because that’s what we 
use in the code.


Hanna

Re: [PATCH] target/m68k: Optimize shift_mem() using extract() TCG opcode

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/3/21 17:30, Richard Henderson wrote:
> On 10/3/21 10:24 AM, Philippe Mathieu-Daudé wrote:
>> When running the scripts/coccinelle/tcg_gen_extract.cocci
>> Coccinelle semantic patch on target/m68k/, we get:
>>
>>  [DBG] candidate at target/m68k/translate.c:3668
>>
>> Manually inspect and replace combinations of (shri, andi)
>> and (movi, andi) opcodes by the extract opcode.
>>
>> Signed-off-by: Philippe Mathieu-Daudé
>> ---
>>   target/m68k/translate.c | 5 ++---
>>   1 file changed, 2 insertions(+), 3 deletions(-)
> 
> Reviewed-by: Richard Henderson 

Another candidate for tcg-next :)

Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)

2021-10-27 Thread Cédric Le Goater


I am raising this because the nanoMIPS support is in deprecated state
since more than 2 releases, but it is still in-tree and I try to keep
it functional. However, since no toolchain reached mainstream GCC/LLVM
it is not easy to maintain. By keeping it in that state we give some
time to other communities to have their toolchain upstreamed / merged.


If you're trying to keep it functional and aren't going to remove
it, then it shouldn't be marked deprecated.


OK, I'll move it back to Odd-fixes.


The ppc405 boards are still in pretty bad shape. We need a patched u-boot,
a patched QEMU and a patched Linux and still, we do not reach user space
without some sort of crash.

C.

Re: [PATCH 1/1] gitlab-ci: Only push docker images to registry from /master branch

2021-10-27 Thread Daniel P . Berrangé

On Wed, Oct 27, 2021 at 08:01:07AM +0200, Thomas Huth wrote:
> On 26/10/2021 16.55, Philippe Mathieu-Daudé wrote:
> > Users expect images pulled from registry.gitlab.com/qemu-project/qemu/
> > to be stable. QEMU repository workflow pushes merge candidates to
> > the /staging branch, and on success the same commit is pushed as
> > /master. If /staging fails, we do not want to push the built images
> > to the registry. Therefore limit the 'docker push' command to the
> > /master branch on the mainstream CI. The fork behavior is unchanged.
> 
> Hmmm, what if I have a patch series that updates one of the containers and
> then also contains a new test that depends on the updated container? Won't
> that fail in the staging branch now and make me look bad?

Yep, if a patch series contains a dockerfile change we *must* run the
container build no matter what, so tis patch doesn't fly.

This scenario a tricky problem, and I'm not seeing an easy answer to it
so far.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 0/3] hw/input/lasips2: QOM'ify the Lasi PS/2

2021-10-27 Thread Laurent Vivier


Le 27/10/2021 à 07:11, Philippe Mathieu-Daudé a écrit :

Cc'ing qemu-trivial@ (fully reviewed).

On 9/20/21 08:40, Philippe Mathieu-Daudé wrote:

Slowly nuking non-QOM devices: Lasi PS/2's turn.

Philippe Mathieu-Daudé (3):
   hw/input/lasips2: Fix typos in function names
   hw/input/lasips2: Move LASIPS2State declaration to
 'hw/input/lasips2.h'
   hw/input/lasips2: QOM'ify the Lasi PS/2

  include/hw/input/lasips2.h | 31 --
  hw/hppa/lasi.c | 10 +-
  hw/input/lasips2.c | 64 +++---
  3 files changed, 70 insertions(+), 35 deletions(-)





Applied to my trivial-patches branch.

Thanks,
Laurent

Re: [PATCH v7 0/4] VNC-related HMP/QMP fixes

2021-10-27 Thread Dr. David Alan Gilbert

* Markus Armbruster (arm...@redhat.com) wrote:
> "Dr. David Alan Gilbert"  writes:
> 
> > * Markus Armbruster (arm...@redhat.com) wrote:
> >> I'm done reviewing.  David, care to have another look at the HMP part?
> >
> > Yep, looking good to me - is that going via qmp, hmp, or vnc ?
> 
> Either is fine with me.
> 
> David, Gerd, do you have anything queued up already?

No I've also got nothing else at th emoment.

Dave

-- 
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH-for-6.0.1 0/2] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Daniel P . Berrangé

On Wed, Oct 27, 2021 at 07:26:54AM +0200, Philippe Mathieu-Daudé wrote:
> Hi Michael,
> 
> 2 more patches to avoid gitlab-ci mayhem when you push the
> stable tags. See this cover for more info:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg846861.html

Please don't push this to stable - Thomas points out that it is broken
when any changes to dockerfiles are made.

> 
> Based-on: <20211019140944.152419-1-michael.r...@amd.com>
> "Patch Round-up for stable 6.0.1, freeze on 2021-10-26"
> 
> Daniel P. Berrangé (1):
>   gitlab: only let pages be published from default branch
> 
> Philippe Mathieu-Daudé (1):
>   gitlab-ci: Only push docker images to registry from /master branch
> 
>  .gitlab-ci.d/containers.yml | 11 ++-
>  .gitlab-ci.d/edk2.yml   | 11 ++-
>  .gitlab-ci.d/opensbi.yml| 11 ++-
>  .gitlab-ci.yml  | 18 ++
>  4 files changed, 48 insertions(+), 3 deletions(-)
> 
> -- 
> 2.31.1
> 
> 
> 

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/1] gitlab-ci: Only push docker images to registry from /master branch

2021-10-27 Thread Philippe Mathieu-Daudé

Hi Thomas,

On 10/27/21 08:01, Thomas Huth wrote:
> On 26/10/2021 16.55, Philippe Mathieu-Daudé wrote:
>> Users expect images pulled from registry.gitlab.com/qemu-project/qemu/
>> to be stable. QEMU repository workflow pushes merge candidates to
>> the /staging branch, and on success the same commit is pushed as
>> /master. If /staging fails, we do not want to push the built images
>> to the registry. Therefore limit the 'docker push' command to the
>> /master branch on the mainstream CI. The fork behavior is unchanged.
> 
> Hmmm, what if I have a patch series that updates one of the containers
> and then also contains a new test that depends on the updated container?
> Won't that fail in the staging branch now and make me look bad?

Good point. My understanding is:

- All tests based on Docker containers pull from DOCKER_DEFAULT_REGISTRY
  (see tests/docker/Makefile.include). These tests can be run on CI
  but are principally run locally, using the ':latest' tag, which is
  the reason we don't want to push invalid / oudated images.

- CI tests also use the Docker containers. We might want to test freshly
  built images. Here we should explicit the use of local tag (without
  using the registry prefix) in this case. Otherwise default to the
  latest from registry (stable).

So the problem you mentioned is the second case, and should be reworked
in the YAML. I will revisit the overall CI YAML for your case, but the
first case seems equally important.

Regards,

Phil.

[PATCH] Fix tcg_out_vec_op argument type

2021-10-27 Thread Miroslav Rezanina

Newly defined tcg_out_vec_op (34ef767609 tcg/s390x: Add host vector framework)
for s390x uses pointer argument definition.
This fails on gcc 11 as original declaration uses array argument:

In file included from ../tcg/tcg.c:430:
/builddir/build/BUILD/qemu-6.1.50/tcg/s390x/tcg-target.c.inc:2702:42: error: 
argument 5 of type 'const TCGArg *' {aka 'const long unsigned int *'} declared 
as a pointer [-Werror=array-parameter=]
 2702 |const TCGArg *args, const int *const_args)
  |~~^~~~
../tcg/tcg.c:121:41: note: previously declared as an array 'const TCGArg[16]' 
{aka 'const long unsigned int[16]'}
  121 |const TCGArg args[TCG_MAX_OP_ARGS],
  |~^
In file included from ../tcg/tcg.c:430:
/builddir/build/BUILD/qemu-6.1.50/tcg/s390x/tcg-target.c.inc:2702:59: error: 
argument 6 of type 'const int *' declared as a pointer 
[-Werror=array-parameter=]
 2702 |const TCGArg *args, const int *const_args)
  |~~~^~
../tcg/tcg.c:122:38: note: previously declared as an array 'const int[16]'
  122 |const int const_args[TCG_MAX_OP_ARGS]);
  |~~^~~

Fixing argument type to pass build.

Signed-off-by: Miroslav Rezanina 
---
 tcg/s390x/tcg-target.c.inc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index 8938c446c8..57e803e339 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -2699,7 +2699,8 @@ static void tcg_out_dupi_vec(TCGContext *s, TCGType type, 
unsigned vece,
 
 static void tcg_out_vec_op(TCGContext *s, TCGOpcode opc,
unsigned vecl, unsigned vece,
-   const TCGArg *args, const int *const_args)
+   const TCGArg args[TCG_MAX_OP_ARGS],
+   const int const_args[TCG_MAX_OP_ARGS])
 {
 TCGType type = vecl + TCG_TYPE_V64;
 TCGArg a0 = args[0], a1 = args[1], a2 = args[2];
-- 
2.27.0

[PATCH v3 2/2] vfio/common: Add a trace point when a MMIO RAM section cannot be mapped

2021-10-27 Thread Kunkun Jiang

The MSI-X structures of some devices and other non-MSI-X structures
may be in the same BAR. They may share one host page, especially in
the case of large page granularity, such as 64K.

For example, MSIX-Table size of 82599 NIC is 0x30 and the offset in
Bar 3(size 64KB) is 0x0. vfio_listener_region_add() will be called
to map the remaining range (0x30-0x). If host page size is 64KB,
it will return early at 'int128_ge((int128_make64(iova), llend))'
without any message. Let's add a trace point to inform users like commit
5c08600547c0 ("vfio: Use a trace point when a RAM section cannot be DMA mapped")
did.

Signed-off-by: Kunkun Jiang 
---
 hw/vfio/common.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index a784b219e6..dd387b0d39 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -893,6 +893,13 @@ static void vfio_listener_region_add(MemoryListener 
*listener,
 llend = int128_and(llend, int128_exts64(qemu_real_host_page_mask));
 
 if (int128_ge(int128_make64(iova), llend)) {
+if (memory_region_is_ram_device(section->mr)) {
+trace_vfio_listener_region_add_no_dma_map(
+memory_region_name(section->mr),
+section->offset_within_address_space,
+int128_getlo(section->size),
+qemu_real_host_page_size);
+}
 return;
 }
 end = int128_get64(int128_sub(llend, int128_one()));
-- 
2.23.0

[PATCH v3 1/2] vfio/pci: Add support for mmapping sub-page MMIO BARs after live migration

2021-10-27 Thread Kunkun Jiang

We can expand MemoryRegions of sub-page MMIO BARs in
vfio_pci_write_config() to improve IO performance for some
devices. However, the MemoryRegions of destination VM are
not expanded any more after live migration. Because their
addresses have been updated in vmstate_load_state()
(vfio_pci_load_config) and vfio_sub_page_bar_update_mapping()
will not be called.

This may result in poor performance after live migration.
So iterate BARs in vfio_pci_load_config() and try to update
sub-page BARs.

Reported-by: Nianyao Tang 
Reported-by: Qixin Gan 
Signed-off-by: Kunkun Jiang 
---
 hw/vfio/pci.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5cdf1d4298..7b45353ce2 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2453,7 +2453,12 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
QEMUFile *f)
 {
 VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
 PCIDevice *pdev = &vdev->pdev;
-int ret;
+pcibus_t old_addr[PCI_NUM_REGIONS - 1];
+int bar, ret;
+
+for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
+old_addr[bar] = pdev->io_regions[bar].addr;
+}
 
 ret = vmstate_load_state(f, &vmstate_vfio_pci_config, vdev, 1);
 if (ret) {
@@ -2463,6 +2468,18 @@ static int vfio_pci_load_config(VFIODevice *vbasedev, 
QEMUFile *f)
 vfio_pci_write_config(pdev, PCI_COMMAND,
   pci_get_word(pdev->config + PCI_COMMAND), 2);
 
+for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
+/*
+ * The address may not be changed in some scenarios
+ * (e.g. the VF driver isn't loaded in VM).
+ */
+if (old_addr[bar] != pdev->io_regions[bar].addr &&
+vdev->bars[bar].region.size > 0 &&
+vdev->bars[bar].region.size < qemu_real_host_page_size) {
+vfio_sub_page_bar_update_mapping(pdev, bar);
+}
+}
+
 if (msi_enabled(pdev)) {
 vfio_msi_enable(vdev);
 } else if (msix_enabled(pdev)) {
-- 
2.23.0

[PATCH v3 0/2] vfio: Some fixes about vfio-pci MMIO BARs mapping

2021-10-27 Thread Kunkun Jiang

This series include patches as below:

Patch 1:
- Added support for mmapping sub-page MMIO BAR after live migration to improve
  IO performance for some devices

Patch 2:
- Added a trace point to informe users when a MMIO RAM section cannot be mapped

History:

v2 -> v3:
- Modify commit message [Eric Auger]

v1 -> v2:
- Add iterate sub-page BARs in vfio_pci_load_config and try to update them 
[Alex Williamson]

Kunkun Jiang (2):
  vfio/pci: Add support for mmapping sub-page MMIO BARs after live
migration
  vfio/common: Add a trace point when a MMIO RAM section cannot be
mapped

 hw/vfio/common.c |  7 +++
 hw/vfio/pci.c| 19 ++-
 2 files changed, 25 insertions(+), 1 deletion(-)

-- 
2.23.0

Re: [PATCH v2 1/3] file-posix: add `aio-max-batch` option

2021-10-27 Thread Stefan Hajnoczi

On Tue, Oct 26, 2021 at 06:23:44PM +0200, Stefano Garzarella wrote:
> Commit d7ddd0a161 ("linux-aio: limit the batch size using
> `aio-max-batch` parameter") added a way to limit the batch size
> of Linux AIO backend for the entire AIO context.
> 
> The same AIO context can be shared by multiple devices, so
> latency-sensitive devices may want to limit the batch size even
> more to avoid increasing latency.
> 
> For this reason we add the `aio-max-batch` option to the file
> backend, which will be used by the next commits to limit the size of
> batches including requests generated by this device.
> 
> Suggested-by: Kevin Wolf 
> Reviewed-by: Kevin Wolf 
> Signed-off-by: Stefano Garzarella 
> ---
> 
> Notes:
> v2:
> - @aio-max-batch documentation rewrite [Stefan, Kevin]
> 
>  qapi/block-core.json | 7 +++
>  block/file-posix.c   | 9 +
>  2 files changed, 16 insertions(+)

Reviewed-by: Stefan Hajnoczi 


signature.asc
Description: PGP signature

Re: [PATCH v2 1/3] file-posix: add `aio-max-batch` option

2021-10-27 Thread Stefano Garzarella


On Wed, Oct 27, 2021 at 06:28:51AM +0200, Markus Armbruster wrote:

Stefano Garzarella  writes:


Commit d7ddd0a161 ("linux-aio: limit the batch size using
`aio-max-batch` parameter") added a way to limit the batch size
of Linux AIO backend for the entire AIO context.

The same AIO context can be shared by multiple devices, so
latency-sensitive devices may want to limit the batch size even
more to avoid increasing latency.

For this reason we add the `aio-max-batch` option to the file
backend, which will be used by the next commits to limit the size of
batches including requests generated by this device.

Suggested-by: Kevin Wolf 
Reviewed-by: Kevin Wolf 
Signed-off-by: Stefano Garzarella 
---

Notes:
v2:
- @aio-max-batch documentation rewrite [Stefan, Kevin]

 qapi/block-core.json | 7 +++
 block/file-posix.c   | 9 +
 2 files changed, 16 insertions(+)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6d3217abb6..fef76b0ea2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2896,6 +2896,12 @@
 #  for this device (default: none, forward the commands via SG_IO;
 #  since 2.11)
 # @aio: AIO backend (default: threads) (since: 2.8)
+# @aio-max-batch: maximum number of requests to batch together into a single
+# submission in the AIO backend. The smallest value between
+# this and the aio-max-batch value of the IOThread object is
+# chosen.
+# 0 means that the AIO backend will handle it automatically.
+# (default: 0, since 6.2)


"(default 0) (since 6.2)" seems to be more common.


Indeed I wasn't sure, so I followed @drop-cache, the last one added in 
@BlockdevOptionsFile.


I'll fix in v3 :-)

Thanks,
Stefano

Re: [PATCH] target/avr: Optimize various functions using extract opcode

2021-10-27 Thread Michael Rolnik

hi Philippe

how was it tested?

On Wed, Oct 27, 2021 at 7:42 AM Philippe Mathieu-Daudé 
wrote:

> Hi Richard,
>
> On 10/3/21 17:24, Richard Henderson wrote:
> > On 10/3/21 10:21 AM, Philippe Mathieu-Daudé wrote:
> >> When running the scripts/coccinelle/tcg_gen_extract.cocci
> >> Coccinelle semantic patch on target/avr/, we get:
> >>
> >>[DBG] candidate at target/avr/translate.c:228
> >>[DBG] candidate at target/avr/translate.c:266
> >>[DBG] candidate at target/avr/translate.c:885
> >>[DBG] candidate at target/avr/translate.c:924
> >>[DBG] candidate at target/avr/translate.c:962
> >>
> >> Manually inspect and replace combinations of (shri, andi)
> >> opcodes by the extract opcode.
> >>
> >> Signed-off-by: Philippe Mathieu-Daudé
> >> ---
> >>   target/avr/translate.c | 16 +---
> >>   1 file changed, 5 insertions(+), 11 deletions(-)
> >
> > Reviewed-by: Richard Henderson 
>
> Do you mind taking this patch via tcg-next?
>


-- 
Best Regards,
Michael Rolnik

Re: [PATCH 3/6] hw/sh4: Change debug printfs to traces

2021-10-27 Thread BALATON Zoltan


On Tue, 26 Oct 2021, Richard Henderson wrote:

On 10/26/21 6:32 PM, BALATON Zoltan wrote:

+trace_sh_serial("write", size, offs, val);
  switch (offs) {
  case 0x00: /* SMR */
  s->smr = val & ((s->feat & SH_SERIAL_FEAT_SCIF) ? 0x7b : 0xff);
@@ -302,10 +298,7 @@ static uint64_t sh_serial_read(void *opaque, hwaddr 
offs,

  break;
  }
  }
-#ifdef DEBUG_SERIAL
-printf("sh_serial: read offs=0x%02x val=0x%x\n",
-   offs, ret);
-#endif
+trace_sh_serial("read ", size, offs, ret);


I suggest two separate sh_serial_{read,write} tracepoints.


Thought about that but it's unlikely one would only want to trace one 
direction, more likely to want all access to the device. But if it's a 
requirement I can split this into separate _read and _write.


Regards,
BALATON Zoltan

[PATCH v2] failover: specify an alternate MAC address

2021-10-27 Thread Laurent Vivier

If the guest driver doesn't support the STANDBY feature, by default
we keep the virtio-net device and don't hotplug the VFIO device,
but in some cases, user can prefer to use the VFIO device rather
than the virtio-net one. We can't unplug the virtio-net device
(because on migration it is expected on the destination side) but
we can keep both interfaces if the MAC addresses are different
(to have the same MAC address can cause kernel crash with old
kernel). The VFIO device will be unplugged before the migration
like in the normal failover migration but without a failover device.

This patch adds a new property to the virtio-net device:
"failover-legacy-mac"

If an alternate MAC address is provided with "failover-legacy-mac" and
the STANDBY feature is not supported, both interfaces are plugged
but the standby interface (virtio-net) MAC address is set to the
value provided by the "failover-legacy-mac" parameter.

If the STANDBY feature is supported by guest and QEMU, the virtio-net
failover acts as usual.

Signed-off-by: Laurent Vivier 
---

Notes:
v2: rename alt-mac to failover-legacy-mac
update doc with text provided by MST

 docs/system/virtio-net-failover.rst | 10 ++
 hw/net/virtio-net.c | 48 +++--
 include/hw/virtio/virtio-net.h  |  6 
 3 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/docs/system/virtio-net-failover.rst 
b/docs/system/virtio-net-failover.rst
index 6002dc5d96e4..99f21cd55ef7 100644
--- a/docs/system/virtio-net-failover.rst
+++ b/docs/system/virtio-net-failover.rst
@@ -51,6 +51,16 @@ Usage
   is only for pairing the devices within QEMU. The guest kernel module
   net_failover will match devices with identical MAC addresses.
 
+  For legacy guests (including bios/EUFI) not supporting VIRTIO_NET_F_STANDBY,
+  two options exist:
+
+  1. if failover-legacy-mac has not been configured (default)
+ only the standby virtio-net device is visible to the guest
+
+  2. if failover-legacy-mac has been configured, virtio and vfio devices will
+ be presented to guest as two NIC devices, with virtio using the
+ failover-legacy-mac address.
+
 Hotplug
 ---
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f2014d5ea0b3..0d47d287de14 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -45,6 +45,9 @@
 #include "net_rx_pkt.h"
 #include "hw/virtio/vhost.h"
 
+/* zero MAC address to check with */
+static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
+
 #define VIRTIO_NET_VM_VERSION11
 
 #define MAC_TABLE_ENTRIES64
@@ -126,7 +129,6 @@ static void virtio_net_get_config(VirtIODevice *vdev, 
uint8_t *config)
 VirtIONet *n = VIRTIO_NET(vdev);
 struct virtio_net_config netcfg;
 NetClientState *nc = qemu_get_queue(n->nic);
-static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
 
 int ret = 0;
 memset(&netcfg, 0 , sizeof(struct virtio_net_config));
@@ -871,10 +873,21 @@ static void failover_add_primary(VirtIONet *n, Error 
**errp)
 error_propagate(errp, err);
 }
 
+static void failover_plug_primary(VirtIONet *n)
+{
+Error *err = NULL;
+
+qapi_event_send_failover_negotiated(n->netclient_name);
+qatomic_set(&n->failover_primary_hidden, false);
+failover_add_primary(n, &err);
+if (err) {
+warn_report_err(err);
+}
+}
+
 static void virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
-Error *err = NULL;
 int i;
 
 if (n->mtu_bypass_backend &&
@@ -921,12 +934,22 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
uint64_t features)
 memset(n->vlans, 0xff, MAX_VLAN >> 3);
 }
 
-if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
-qapi_event_send_failover_negotiated(n->netclient_name);
-qatomic_set(&n->failover_primary_hidden, false);
-failover_add_primary(n, &err);
-if (err) {
-warn_report_err(err);
+if (n->failover) {
+if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
+if (memcmp(&n->legacy_mac, &zero, sizeof(zero)) != 0 &&
+memcmp(n->mac, &n->legacy_mac, ETH_ALEN) == 0) {
+/*
+ * set_features can be called twice, without & with F_STANDBY,
+ * so restore original MAC address
+ */
+memcpy(n->mac, &n->nic->conf->macaddr, sizeof(n->mac));
+qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+}
+failover_plug_primary(n);
+} else if (memcmp(&n->legacy_mac, &zero, sizeof(zero)) != 0) {
+memcpy(n->mac, &n->legacy_mac, ETH_ALEN);
+qemu_format_nic_info_str(qemu_get_queue(n->nic), n->mac);
+failover_plug_primary(n);
 }
 }
 }
@@ -3595,9 +3618,15 @@ static bool primary_unplug_pending(void *opaque)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIONet *n = VIRTIO_NET(vdev);
 
-

Re: [PATCH] Fix tcg_out_vec_op argument type

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/27/21 10:56, Miroslav Rezanina wrote:
> Newly defined tcg_out_vec_op (34ef767609 tcg/s390x: Add host vector framework)
> for s390x uses pointer argument definition.
> This fails on gcc 11 as original declaration uses array argument:
> 
> In file included from ../tcg/tcg.c:430:
> /builddir/build/BUILD/qemu-6.1.50/tcg/s390x/tcg-target.c.inc:2702:42: error: 
> argument 5 of type 'const TCGArg *' {aka 'const long unsigned int *'} 
> declared as a pointer [-Werror=array-parameter=]
>  2702 |const TCGArg *args, const int *const_args)
>   |~~^~~~
> ../tcg/tcg.c:121:41: note: previously declared as an array 'const TCGArg[16]' 
> {aka 'const long unsigned int[16]'}
>   121 |const TCGArg args[TCG_MAX_OP_ARGS],
>   |~^
> In file included from ../tcg/tcg.c:430:
> /builddir/build/BUILD/qemu-6.1.50/tcg/s390x/tcg-target.c.inc:2702:59: error: 
> argument 6 of type 'const int *' declared as a pointer 
> [-Werror=array-parameter=]
>  2702 |const TCGArg *args, const int *const_args)
>   |~~~^~
> ../tcg/tcg.c:122:38: note: previously declared as an array 'const int[16]'
>   122 |const int const_args[TCG_MAX_OP_ARGS]);
>   |~~^~~
> 
> Fixing argument type to pass build.

Similarly to commit 5e8892db93f ("tcg: Fix prototypes for tcg_out_vec_op
and tcg_out_op"), fix the argument type.

> Signed-off-by: Miroslav Rezanina 
> ---
>  tcg/s390x/tcg-target.c.inc | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Reviewed-by: Philippe Mathieu-Daudé

Re: [PATCH-for-6.0.1 0/2] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Philippe Mathieu-Daudé

+Richard/Peter

On 10/27/21 10:49, Daniel P. Berrangé wrote:
> On Wed, Oct 27, 2021 at 07:26:54AM +0200, Philippe Mathieu-Daudé wrote:
>> Hi Michael,
>>
>> 2 more patches to avoid gitlab-ci mayhem when you push the
>> stable tags. See this cover for more info:
>> https://www.mail-archive.com/qemu-devel@nongnu.org/msg846861.html
> 
> Please don't push this to stable - Thomas points out that it is broken
> when any changes to dockerfiles are made.

But we still don't want to update the registry with these old
images...

What is the plan then, hold the stable tag until we figure out
the best fix?

Otherwise Michael can use 'git-push --push-option=ci.skip' to
not trigger a CI pipeline when pushing stable tags (running
CI pipelines previously in his own gitlab namespace).

>> Based-on: <20211019140944.152419-1-michael.r...@amd.com>
>> "Patch Round-up for stable 6.0.1, freeze on 2021-10-26"
>>
>> Daniel P. Berrangé (1):
>>   gitlab: only let pages be published from default branch
>>
>> Philippe Mathieu-Daudé (1):
>>   gitlab-ci: Only push docker images to registry from /master branch
>>
>>  .gitlab-ci.d/containers.yml | 11 ++-
>>  .gitlab-ci.d/edk2.yml   | 11 ++-
>>  .gitlab-ci.d/opensbi.yml| 11 ++-
>>  .gitlab-ci.yml  | 18 ++
>>  4 files changed, 48 insertions(+), 3 deletions(-)
>>
>> -- 
>> 2.31.1
>>
>>
>>
> 
> Regards,
> Daniel
>

Re: [PATCH v2] failover: specify an alternate MAC address

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/27/21 11:59, Laurent Vivier wrote:
> If the guest driver doesn't support the STANDBY feature, by default
> we keep the virtio-net device and don't hotplug the VFIO device,
> but in some cases, user can prefer to use the VFIO device rather
> than the virtio-net one. We can't unplug the virtio-net device
> (because on migration it is expected on the destination side) but
> we can keep both interfaces if the MAC addresses are different
> (to have the same MAC address can cause kernel crash with old
> kernel). The VFIO device will be unplugged before the migration
> like in the normal failover migration but without a failover device.
> 
> This patch adds a new property to the virtio-net device:
> "failover-legacy-mac"
> 
> If an alternate MAC address is provided with "failover-legacy-mac" and
> the STANDBY feature is not supported, both interfaces are plugged
> but the standby interface (virtio-net) MAC address is set to the
> value provided by the "failover-legacy-mac" parameter.
> 
> If the STANDBY feature is supported by guest and QEMU, the virtio-net
> failover acts as usual.
> 
> Signed-off-by: Laurent Vivier 
> ---
> 
> Notes:
> v2: rename alt-mac to failover-legacy-mac
> update doc with text provided by MST
> 
>  docs/system/virtio-net-failover.rst | 10 ++
>  hw/net/virtio-net.c | 48 +++--
>  include/hw/virtio/virtio-net.h  |  6 
>  3 files changed, 55 insertions(+), 9 deletions(-)

> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index f2014d5ea0b3..0d47d287de14 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -45,6 +45,9 @@
>  #include "net_rx_pkt.h"
>  #include "hw/virtio/vhost.h"
>  
> +/* zero MAC address to check with */
> +static const MACAddr zero = { .a = { 0, 0, 0, 0, 0, 0 } };
> +

Eventual future cleanup, it might be clearer to declare it extern
in "net/net.h" and use the one in qemu_macaddr_default_if_unset().

Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion

2021-10-27 Thread Stefan Hajnoczi

On Tue, Oct 26, 2021 at 08:01:39PM +0100, John Levon wrote:
> On Mon, Oct 25, 2021 at 08:21:22AM -0700, Elena wrote:
> 
> > > I'm curious what approach you want to propose for QEMU integration. A
> > > while back I thought about the QEMU API. It's possible to implement it
> > > along the lines of the memory_region_add_eventfd() API where each
> > > ioregionfd is explicitly added by device emulation code. An advantage of
> > > this approach is that a MemoryRegion can have multiple ioregionfds, but
> > > I'm not sure if that is a useful feature.
> > >
> > 
> > This is the approach that is currently in the works. Agree, I dont see
> > much of the application here at this point to have multiple ioregions
> > per MemoryRegion.
> > I added Memory API/eventfd approach to the vfio-user as well to try
> > things out.
> > 
> > > An alternative is to cover the entire MemoryRegion with one ioregionfd.
> > > That way the device emulation code can use ioregionfd without much fuss
> > > since there is a 1:1 mapping between MemoryRegions, which are already
> > > there in existing devices. There is no need to think deeply about which
> > > ioregionfds to create for a device.
> > >
> > > A new API called memory_region_set_aio_context(MemoryRegion *mr,
> > > AioContext *ctx) would cause ioregionfd (or a userspace fallback for
> > > non-KVM cases) to execute the MemoryRegion->read/write() accessors from
> > > the given AioContext. The details of ioregionfd are hidden behind the
> > > memory_region_set_aio_context() API, so the device emulation code
> > > doesn't need to know the capabilities of ioregionfd.
> > 
> > > 
> > > The second approach seems promising if we want more devices to use
> > > ioregionfd inside QEMU because it requires less ioregionfd-specific
> > > code.
> > > 
> > I like this approach as well.
> > As you have mentioned, the device emulation code with first approach
> > does have to how to handle the region accesses. The second approach will
> > make things more transparent. Let me see how can I modify what there is
> > there now and may ask further questions.
> 
> Sorry I'm a bit late to this discussion, I'm not clear on the above WRT
> vfio-user. If an ioregionfd has to cover a whole BAR0 (?), how would this
> interact with partly-mmap()able regions like we do with SPDK/vfio-user/NVMe?

The ioregionfd doesn't need to cover an entire BAR. QEMU's MemoryRegions
form a hierarchy, so it's possible to sub-divide the BAR into several
MemoryRegions.

This means it's still possible to have mmap() sub-regions or even
ioeventfds sprinkled in between.

Stefan


signature.asc
Description: PGP signature

Re: [PATCH-for-6.0.1 0/2] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Daniel P . Berrangé

On Wed, Oct 27, 2021 at 12:09:39PM +0200, Philippe Mathieu-Daudé wrote:
> +Richard/Peter
> 
> On 10/27/21 10:49, Daniel P. Berrangé wrote:
> > On Wed, Oct 27, 2021 at 07:26:54AM +0200, Philippe Mathieu-Daudé wrote:
> >> Hi Michael,
> >>
> >> 2 more patches to avoid gitlab-ci mayhem when you push the
> >> stable tags. See this cover for more info:
> >> https://www.mail-archive.com/qemu-devel@nongnu.org/msg846861.html
> > 
> > Please don't push this to stable - Thomas points out that it is broken
> > when any changes to dockerfiles are made.
> 
> But we still don't want to update the registry with these old
> images...
> 
> What is the plan then, hold the stable tag until we figure out
> the best fix?
> 
> Otherwise Michael can use 'git-push --push-option=ci.skip' to
> not trigger a CI pipeline when pushing stable tags (running
> CI pipelines previously in his own gitlab namespace).

Yes, I'd suggest that approach currently, until we figure out a
real long term solution.


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v4 0/8] qapi: Add support for aliases

2021-10-27 Thread Kevin Wolf

Am 26.10.2021 um 23:23 hat John Snow geschrieben:
> On Fri, Sep 17, 2021 at 12:13 PM Kevin Wolf  wrote:
> 
> > This series introduces alias definitions for QAPI object types (structs
> > and unions).
> >
> > This allows using the same QAPI type and visitor even when the syntax
> > has some variations between different external interfaces such as QMP
> > and the command line.
> >
> >
> I'm absurdly late to the party here, and I'm afraid my involvement may only
> stall your progress even further, but ...

I'm not sure if it's worth investing much of your time in this. After a
year of discussing implementation details, Markus decided that he
doesn't like the whole approach, so the series is probably dead in this
shape. Maybe parts of it (possibly even large parts) can be saved and
used in a modified approach, depending on how radically different the
new approach is.

Markus promised that he'll think of alternative approaches to solve the
problem. I'm waiting for that before I waste more time implementing
something else that is also dead from the start.

> can you fill me in on the slightly-less-higher-level details here?
> 
> I'm curious as to the nature of "has some variations" and how the aliases
> help alleviate them. Do you accomplish that compatibility by using
> different names for different fields of the struct?

> so e.g. x.foo can be used as an alias for x.bar, but both map ultimately
> onto 'x.foo'? Or does this series provide for some more fundamental
> mechanical changes to map one type onto another?

I would recommend reading the changes to docs/devel/qapi-code-gen.rst in
patch 7, which explain the implemented mechanism.

For -chardev, most of "some varations" are different names for the same
member of a struct, or nesting in QMP that you don't want to have on the
command line. I went through all cases in one of the last messages in
the v3 thread, but I think these are the two important categories of
cases.

The C types stay the same as they have always been, aliases just provide
an alternative way to specify the same thing in the input.

> > It also provides a new tool for evolving the schema while maintaining
> > backwards compatibility (possibly during a deprecation period).
> >
> > The first user is intended to be a QAPIfied -chardev command line
> > option, for which I'll send a separate series. A git tag is available
> > that contains both this series and the chardev changes that make use of
> > it:
> >
> > https://repo.or.cz/qemu/kevin.git qapi-alias-chardev-v4

You may also want to have a look at this, and specifically
qapi/char.json, to see how I used aliases in practice.

Kevin

Re: [RFC 0/2] tls: add macros for coroutine-safe TLS variables

2021-10-27 Thread Kevin Wolf

Am 26.10.2021 um 19:10 hat Richard Henderson geschrieben:
> On 10/26/21 9:34 AM, Stefan Hajnoczi wrote:
> > On Tue, Oct 26, 2021 at 08:10:16AM -0700, Richard Henderson wrote:
> > > On 10/26/21 6:22 AM, Stefan Hajnoczi wrote:
> > > > If "safe" TLS variables are opt-in then we'll likely have obscure bugs
> > > > when code changes to access a TLS variable that was previously never
> > > > accessed from a coroutine. There is no compiler error and no way to
> > > > detect this. When it happens debugging it is painful.
> > > 
> > > Co-routines are never used in user-only builds.
> > 
> > If developers have the choice of using __thread then bugs can slip
> > through.
> 
> Huh?  How.  No, really.
> 
> > Are you concerned about performance, the awkwardness of calling
> > getters/setters, or something else for qemu-user?
> 
> Awkwardness first, performance second.
> 
> I'll also note that coroutines never run on vcpu threads, only io threads.
> So I'll resist any use of these interfaces in TCG as well.

This is wrong. Device emulation is called from vcpu threads, and it
calls into code using coroutines. Using dedicated iothreads is
possible for some devices, but not the default.

In fact, this is probably where the most visible case of the bug comes
from: A coroutine is created in the vcpu thread (while holding the BQL)
and after yielding first, it is subsequently reentered from the main
thread. This is exactly the same pattern as you often get with
callbacks, where the request is made in the vcpu thread and the callback
is run in the main thread.

Kevin

Re: [PATCH 04/33] tests/tcg/mips: Run MSA opcodes tests on user-mode emulation

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/23/21 23:47, Philippe Mathieu-Daudé wrote:
> The following commits added various user-mode tests
> for various MSA instructions:
> 
>  - 0fdd986a6c8 ("Add tests for MSA integer add instructions")
>  - 1be82d89011 ("Add tests for MSA integer average instructions")
>  - 1d336c87a3c ("Add tests for MSA bit set instructions")
>  - 1e6bea794c8 ("Add tests for MSA integer max/min instructions")
>  - 2a367db039f ("Add tests for MSA pack instructions")
>  - 3d9569b8550 ("Add tests for MSA move instructions")
>  - 4b302ce90db ("Add tests for MSA integer multiply instructions")
>  - 520e210c0aa ("Add tests for MSA integer compare instructions")
>  - 53e116fed6d ("Add tests for MSA integer subtract instructions")
>  - 666952ea7c1 ("Add tests for MSA bit move instructions")
>  - 72f463bc080 ("Add tests for MSA integer divide instructions")
>  - 8598f5fac1c ("Add tests for MSA FP max/min instructions")
>  - 99d423e576a ("Add tests for MSA shift instructions")
>  - a8f91dd9fd0 ("Add tests for MSA integer dot product instructions")
>  - b62592ab655 ("Add tests for MSA bit counting instructions")
>  - ba632924450 ("Add tests for MSA logic instructions")
>  - fc76f486677 ("Add tests for MSA interleave instructions")
> 
> Cover them in the buildsys machinery so they are run automatically
> when calling 'make check-tcg'.
> 
> Start running them on the mips64el target.
> 
> Cc: Alex Bennée 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  tests/tcg/mips/ase-msa.mak | 30 ++
>  MAINTAINERS|  1 +
>  tests/tcg/mips/Makefile.target |  5 +
>  tests/tcg/mips64/Makefile.target   |  9 +
>  tests/tcg/mips64el/Makefile.target | 12 
>  tests/tcg/mipsel/Makefile.target   |  9 +
>  6 files changed, 66 insertions(+)
>  create mode 100644 tests/tcg/mips/ase-msa.mak
>  create mode 100644 tests/tcg/mips64/Makefile.target
>  create mode 100644 tests/tcg/mips64el/Makefile.target
>  create mode 100644 tests/tcg/mipsel/Makefile.target
> 
> diff --git a/tests/tcg/mips/ase-msa.mak b/tests/tcg/mips/ase-msa.mak
> new file mode 100644
> index 000..be1ba967a5b
> --- /dev/null
> +++ b/tests/tcg/mips/ase-msa.mak
> @@ -0,0 +1,30 @@
> +# -*- Mode: makefile -*-
> +#
> +# MIPS MSA specific TCG tests
> +#
> +# Copyright (c) 2021 Philippe Mathieu-Daudé 
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-later
> +
> +MSA_DIR = $(SRC_PATH)/tests/tcg/mips/user/ase/msa
> +
> +MSA_TEST_CLASS = bit-count bit-move bit-set fixed-multiply \
> + float-max-min int-add int-average int-compare 
> int-divide \
> + int-dot-product interleave int-max-min 
> int-modulo \
> + int-multiply int-subtract logic move pack shift
> +
> +MSA_TEST_SRCS = $(foreach class,$(MSA_TEST_CLASS),$(wildcard 
> $(MSA_DIR)/$(class)/*.c))
> +
> +MSA_TESTS = $(patsubst %.c,%,$(notdir $(MSA_TEST_SRCS)))
> +
> +$(MSA_TESTS): CFLAGS+=-mmsa $(MSA_CFLAGS)
> +$(MSA_TESTS): %: $(foreach CLASS,$(MSA_TEST_CLASS),$(wildcard 
> $(MSA_DIR)/$(CLASS)/%.c))
> + $(CC) -static $(CFLAGS) -o $@ \
> + $(foreach CLASS,$(MSA_TEST_CLASS),$(wildcard 
> $(MSA_DIR)/$(CLASS)/$@.c))

FYI I am using $wilcard because because the test files are in multiple
directories ($MSA_TEST_CLASS).

> +
> +$(foreach test,$(MSA_TESTS),run-$(test)): QEMU_OPTS += -cpu $(MSA_CPU)
> +
> +# FIXME: These tests fail when using plugins
> +ifneq ($(CONFIG_PLUGIN),y)
> +TESTS += $(MSA_TESTS)
> +endif
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4e77d03651b..53c6c549b80 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3111,6 +3111,7 @@ R: Jiaxun Yang 
>  R: Aleksandar Rikalo 
>  S: Odd Fixes
>  F: tcg/mips/
> +F: tests/tcg/mips*
>  
>  PPC TCG target
>  M: Richard Henderson 
> diff --git a/tests/tcg/mips/Makefile.target b/tests/tcg/mips/Makefile.target
> index 1a994d5525e..191fe179119 100644
> --- a/tests/tcg/mips/Makefile.target
> +++ b/tests/tcg/mips/Makefile.target
> @@ -17,3 +17,8 @@ TESTS += $(MIPS_TESTS)
>  hello-mips: CFLAGS+=-mno-abicalls -fno-PIC -mabi=32
>  hello-mips: LDFLAGS+=-nostdlib
>  endif
> +
> +# FIXME enable MSA tests

This is commented because the Debian toolchain produces:

/usr/mips-linux-gnu/include/gnu/stubs.h:17:11: fatal error:
gnu/stubs-o32_hard_2008.h: No such file or directory
 # include 
   ^~~

> +#MSA_CFLAGS=-march=mips64r5 -mnan=2008

Here I meant:

   MSA_CFLAGS=-march=mips32r5 -mnan=2008

Anyhow, similar error using -march=mips32r6.

> +#MSA_CPU=P5600
> +#include $(SRC_PATH)/tests/tcg/mips/ase-msa.mak
> diff --git a/tests/tcg/mips64/Makefile.target 
> b/tests/tcg/mips64/Makefile.target
> new file mode 100644
> index 000..d876b92f219
> --- /dev/null
> +++ b/tests/tcg/mips64/Makefile.target
> @@ -0,0 +1,9 @@
> +# -*- Mode: makefile -*-
> +#
> +# mips64el specific TCG tests
> +#
> +# Copyright (c) 2021 Philippe Mathieu-Daudé 
> +#
> +# SPDX-License-Identifier: GPL-2.0-or-l

Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)

2021-10-27 Thread Christophe Leroy


Hi Cédric,

Le 27/10/2021 à 10:40, Cédric Le Goater a écrit :

I am raising this because the nanoMIPS support is in deprecated state
since more than 2 releases, but it is still in-tree and I try to keep
it functional. However, since no toolchain reached mainstream GCC/LLVM
it is not easy to maintain. By keeping it in that state we give some
time to other communities to have their toolchain upstreamed / merged.


If you're trying to keep it functional and aren't going to remove
it, then it shouldn't be marked deprecated.


OK, I'll move it back to Odd-fixes.


The ppc405 boards are still in pretty bad shape. We need a patched u-boot,
a patched QEMU and a patched Linux and still, we do not reach user space
without some sort of crash.



I guess Philippe was talking about the nanoMIPS here.

By the way, ppc405 is not on an optimal shape yet for sure, but we are 
getting better and better, and I'm aiming at getting it back to work, 
just because I need it.


By the way, were you able to re-test with CONFIG_MATH_EMULATION enabled 
after your last oops report which shows that you're trying to execute 
floating points instruction ?


Thanks
Christophe

Re: Deprecate the ppc405 boards in QEMU? (was: [PATCH v3 4/7] MAINTAINERS: Orphan obscure ppc platforms)

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/27/21 12:42, Christophe Leroy wrote:
> Le 27/10/2021 à 10:40, Cédric Le Goater a écrit :
> I am raising this because the nanoMIPS support is in deprecated state
> since more than 2 releases, but it is still in-tree and I try to keep
> it functional. However, since no toolchain reached mainstream GCC/LLVM
> it is not easy to maintain. By keeping it in that state we give some
> time to other communities to have their toolchain upstreamed / merged.

 If you're trying to keep it functional and aren't going to remove
 it, then it shouldn't be marked deprecated.
>>>
>>> OK, I'll move it back to Odd-fixes.
>>
>> The ppc405 boards are still in pretty bad shape. We need a patched
>> u-boot,
>> a patched QEMU and a patched Linux and still, we do not reach user space
>> without some sort of crash.
>>
> 
> I guess Philippe was talking about the nanoMIPS here.

Yes, I was following the deprecation thread, sorry for the confusion.

Re: [PATCH] e1000e: Added ICR clearing by corresponding IMS bit.

2021-10-27 Thread Andrew Melnichenko

Hi,
Let's make things clear.
At first, I've decided to fix the issue in the linux e1000e driver.
(
https://lists.osuosl.org/pipermail/intel-wired-lan/Week-of-Mon-20200413/019497.html
)
Original driver developers suggest to fix the issue on qemu and assures
that driver works correctly on real hw.
I've added fix according to the note 13.3.28 of the 8257X manual
(
https://www.intel.com/content/dam/www/public/us/en/documents/manuals/pcie-gbe-controllers-open-source-manual.pdf
)
I've reference to 8257X spec which is an apparently a bit different to
82574l-gbe-controller-datasheet.pdf


On Thu, Oct 21, 2021 at 5:16 AM Jason Wang  wrote:

> Hi Andrew:
>
> On Thu, Oct 21, 2021 at 6:27 AM Andrew Melnichenko 
> wrote:
> >
> > Hi,
> > I've used this manual(
> https://www.intel.com/content/dam/www/public/us/en/documents/manuals/pcie-gbe-controllers-open-source-manual.pdf
> )
> > It was provided by Intel when I've tried to research that bug.
> > Although it's a bit newer manual - the article is 13.3.28.
>
> Note that it's not the model that e1000e tries to implement (82574L).
> The device ID in qemu is 0x10D3 which is not listed in the above link
> "4.7.7 Mandatory PCI Configuration Registers".
>
> Thanks
>
> >
> >
> > On Tue, Oct 19, 2021 at 10:56 AM Jason Wang  wrote:
> >>
> >> On Thu, Oct 14, 2021 at 4:34 PM Andrew Melnichenko 
> wrote:
> >> >
> >> > Ping
> >> >
> >> > On Wed, Aug 18, 2021 at 9:10 PM Andrew Melnychenko 
> wrote:
> >> >>
> >> >> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1707441
> >> >>
> >> >> The issue is in LSC clearing. So, after "link up"(during
> initialization),
> >> >> the next LSC event is masked and can't be processed.
> >> >> Technically, the event should be 'cleared' during ICR read.
> >> >> On Windows guest, everything works well, mostly because of
> >> >> different interrupt routines(ICR clears during register write).
> >> >> So, added ICR clearing during reading, according to the note by
> >> >> section 13.3.27 of the 8257X developers manual.
> >> >>
> >> >> Signed-off-by: Andrew Melnychenko 
> >> >> ---
> >> >>  hw/net/e1000e_core.c | 10 ++
> >> >>  hw/net/trace-events  |  1 +
> >> >>  2 files changed, 11 insertions(+)
> >> >>
> >> >> diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
> >> >> index b75f2ab8fc..288897a975 100644
> >> >> --- a/hw/net/e1000e_core.c
> >> >> +++ b/hw/net/e1000e_core.c
> >> >> @@ -2617,6 +2617,16 @@ e1000e_mac_icr_read(E1000ECore *core, int
> index)
> >> >>  e1000e_clear_ims_bits(core, core->mac[IAM]);
> >> >>  }
> >> >>
> >> >> +/*
> >> >> + * PCIe* GbE Controllers Open Source Software Developer's Manual
> >> >> + * 13.3.27 Interrupt Cause Read Register
> >>
> >> Per link in the beginning of this file it should be 82574l I guess?
> >>
> >> If yes, I'm using revision 3.4 and it's 13.3.27 is not about ICR.
> >>
> >> What it said are:
> >>
> >> "
> >> In MSI-X mode the bits in this register can be configured to
> >> auto-clear when the MSI-X interrupt message is sent, in order to
> >> minimize driver overhead, and when using MSI-X interrupt signaling. In
> >> systems that do not support MSI-X, reading the ICR register clears
> >> it's bits or writing 1b's clears the corresponding bits in this
> >> register.
> >> "
> >>
> >>
> >> >> + */
> >> >> +if ((core->mac[ICR] & E1000_ICR_ASSERTED) &&
> >> >> +(core->mac[ICR] & core->mac[IMS])) {
> >> >> +trace_e1000e_irq_icr_clear_icr_bit_ims(core->mac[ICR],
> core->mac[IMS]);
> >> >> +core->mac[ICR] = 0;
> >> >> +}
> >>
> >> Thanks
> >>
> >> >> +
> >> >>  trace_e1000e_irq_icr_read_exit(core->mac[ICR]);
> >> >>  e1000e_update_interrupt_state(core);
> >> >>  return ret;
> >> >> diff --git a/hw/net/trace-events b/hw/net/trace-events
> >> >> index c28b91ee1a..15fd09aa1c 100644
> >> >> --- a/hw/net/trace-events
> >> >> +++ b/hw/net/trace-events
> >> >> @@ -225,6 +225,7 @@ e1000e_irq_icr_read_entry(uint32_t icr)
> "Starting ICR read. Current ICR: 0x%x"
> >> >>  e1000e_irq_icr_read_exit(uint32_t icr) "Ending ICR read. Current
> ICR: 0x%x"
> >> >>  e1000e_irq_icr_clear_zero_ims(void) "Clearing ICR on read due to
> zero IMS"
> >> >>  e1000e_irq_icr_clear_iame(void) "Clearing ICR on read due to IAME"
> >> >> +e1000e_irq_icr_clear_icr_bit_ims(uint32_t icr, uint32_t ims)
> "Clearing ICR on read due corresponding IMS bit: 0x%x & 0x%x"
> >> >>  e1000e_irq_iam_clear_eiame(uint32_t iam, uint32_t cause) "Clearing
> IMS due to EIAME, IAM: 0x%X, cause: 0x%X"
> >> >>  e1000e_irq_icr_clear_eiac(uint32_t icr, uint32_t eiac) "Clearing
> ICR bits due to EIAC, ICR: 0x%X, EIAC: 0x%X"
> >> >>  e1000e_irq_ims_clear_set_imc(uint32_t val) "Clearing IMS bits due
> to IMC write 0x%x"
> >> >> --
> >> >> 2.31.1
> >> >>
> >>
>
>

Re: [PATCH v5 2/8] python/machine: Handle QMP errors on close more meticulously

2021-10-27 Thread Kevin Wolf

Am 26.10.2021 um 19:56 hat John Snow geschrieben:
> To use the AQMP backend, Machine just needs to be a little more diligent
> about what happens when closing a QMP connection. The operation is no
> longer a freebie in the async world; it may return errors encountered in
> the async bottom half on incoming message receipt, etc.
> 
> (AQMP's disconnect, ultimately, serves as the quiescence point where all
> async contexts are gathered together, and any final errors reported at
> that point.)
> 
> Because async QMP continues to check for messages asynchronously, it's
> almost certainly likely that the loop will have exited due to EOF after
> issuing the last 'quit' command. That error will ultimately be bubbled
> up when attempting to close the QMP connection. The manager class here
> then is free to discard it -- if it was expected.
> 
> Signed-off-by: John Snow 
> Reviewed-by: Hanna Reitz 
> ---
>  python/qemu/machine/machine.py | 48 +-
>  1 file changed, 42 insertions(+), 6 deletions(-)
> 
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index 0bd40bc2f76..a0cf69786b4 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -342,9 +342,15 @@ def _post_shutdown(self) -> None:
>  # Comprehensive reset for the failed launch case:
>  self._early_cleanup()
>  
> -if self._qmp_connection:
> -self._qmp.close()
> -self._qmp_connection = None
> +try:
> +self._close_qmp_connection()
> +except Exception as err:  # pylint: disable=broad-except
> +LOG.warning(
> +"Exception closing QMP connection: %s",
> +str(err) if str(err) else type(err).__name__
> +)
> +finally:
> +assert self._qmp_connection is None
>  
>  self._close_qemu_log_file()
>  
> @@ -420,6 +426,31 @@ def _launch(self) -> None:
> close_fds=False)
>  self._post_launch()
>  
> +def _close_qmp_connection(self) -> None:
> +"""
> +Close the underlying QMP connection, if any.
> +
> +Dutifully report errors that occurred while closing, but assume
> +that any error encountered indicates an abnormal termination
> +process and not a failure to close.
> +"""
> +if self._qmp_connection is None:
> +return
> +
> +try:
> +self._qmp.close()
> +except EOFError:
> +# EOF can occur as an Exception here when using the Async
> +# QMP backend. It indicates that the server closed the
> +# stream. If we successfully issued 'quit' at any point,
> +# then this was expected. If the remote went away without
> +# our permission, it's worth reporting that as an abnormal
> +# shutdown case.
> +if not (self._user_killed or self._quit_issued):
> +raise

Isn't this racy for those tests that expect QEMU to quit by itself and
then later call wait()? self._quit_issued is only set to True in wait(),
but whatever will cause QEMU to quit happens earlier and it might
actually quit before wait() is called.

It would make sense to me that such tests need to declare that they
expect QEMU to quit before actually performing the action. And then
wait() becomes less weird in patch 1, too, because it can just assert
self._quit_issued instead of unconditionally setting it.


The other point I'm unsure is whether you can actually kill QEMU without
getting either self._user_killed or self._quit_issued set. The
potentially problematic case I see is shutdown(hard = False) where soft
shutdown fails. Then a hard shutdown will be performed without setting
self._user_killed (is this a bug?).

Of course, sending the 'quit' command in _soft_shutdown() will set
self._quit_issued at least, but are we absolutely sure that it can never
raise an exception before getting to qmp()? I guess in theory, closing
the console socket in _early_cleanup() could raise one? (But either way,
not relying on such subtleties would make the code easier to
understand.)

Kevin

Re: [PATCH 1/1] gitlab-ci: Only push docker images to registry from /master branch

2021-10-27 Thread Gerd Hoffmann

  Hi,

> This scenario a tricky problem, and I'm not seeing an easy answer to it
> so far.

Only exclude stable branches?

take care,
  Gerd

[PATCH v8 1/8] virtio: drop name parameter for virtio_init()

2021-10-27 Thread Jonah Palmer

This patch drops the name parameter for the virtio_init function.

The pair between the numeric device ID and the string device ID
(name) of a virtio device already exists, but not in a way that
lets us map between them.

This patch will let us do this and removes the need for the name
parameter in virtio_init().

Signed-off-by: Jonah Palmer 
---
 hw/9pfs/virtio-9p-device.c  |  2 +-
 hw/block/vhost-user-blk.c   |  2 +-
 hw/block/virtio-blk.c   |  2 +-
 hw/char/virtio-serial-bus.c |  4 +--
 hw/display/virtio-gpu-base.c|  2 +-
 hw/input/virtio-input.c |  3 +-
 hw/net/virtio-net.c |  2 +-
 hw/scsi/virtio-scsi.c   |  3 +-
 hw/virtio/vhost-user-fs.c   |  3 +-
 hw/virtio/vhost-user-i2c.c  |  6 +---
 hw/virtio/vhost-user-rng.c  |  2 +-
 hw/virtio/vhost-user-vsock.c|  2 +-
 hw/virtio/vhost-vsock-common.c  |  4 +--
 hw/virtio/vhost-vsock.c |  2 +-
 hw/virtio/virtio-balloon.c  |  3 +-
 hw/virtio/virtio-crypto.c   |  2 +-
 hw/virtio/virtio-iommu.c|  3 +-
 hw/virtio/virtio-mem.c  |  3 +-
 hw/virtio/virtio-pmem.c |  3 +-
 hw/virtio/virtio-rng.c  |  2 +-
 hw/virtio/virtio.c  | 45 +++--
 include/hw/virtio/vhost-vsock-common.h  |  2 +-
 include/hw/virtio/virtio-gpu.h  |  3 +-
 include/hw/virtio/virtio.h  |  3 +-
 include/standard-headers/linux/virtio_ids.h |  1 +
 25 files changed, 68 insertions(+), 41 deletions(-)

diff --git a/hw/9pfs/virtio-9p-device.c b/hw/9pfs/virtio-9p-device.c
index 54ee93b..5f522e6 100644
--- a/hw/9pfs/virtio-9p-device.c
+++ b/hw/9pfs/virtio-9p-device.c
@@ -216,7 +216,7 @@ static void virtio_9p_device_realize(DeviceState *dev, 
Error **errp)
 }
 
 v->config_size = sizeof(struct virtio_9p_config) + strlen(s->fsconf.tag);
-virtio_init(vdev, "virtio-9p", VIRTIO_ID_9P, v->config_size);
+virtio_init(vdev, VIRTIO_ID_9P, v->config_size);
 v->vq = virtio_add_queue(vdev, MAX_REQ, handle_9p_output);
 }
 
diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index ba13cb8..f61f8c1 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -490,7 +490,7 @@ static void vhost_user_blk_device_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
-virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
+virtio_init(vdev, VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
 
 s->virtqs = g_new(VirtQueue *, s->num_queues);
diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index f139cd7..505e574 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -1213,7 +1213,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 
 virtio_blk_set_config_size(s, s->host_features);
 
-virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK, s->config_size);
+virtio_init(vdev, VIRTIO_ID_BLOCK, s->config_size);
 
 s->blk = conf->conf.blk;
 s->rq = NULL;
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index f01ec21..232f4c9 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -1044,8 +1044,8 @@ static void virtio_serial_device_realize(DeviceState 
*dev, Error **errp)
 VIRTIO_CONSOLE_F_EMERG_WRITE)) {
 config_size = offsetof(struct virtio_console_config, emerg_wr);
 }
-virtio_init(vdev, "virtio-serial", VIRTIO_ID_CONSOLE,
-config_size);
+
+virtio_init(vdev, VIRTIO_ID_CONSOLE, config_size);
 
 /* Spawn a new virtio-serial bus on which the ports will ride as devices */
 qbus_init(&vser->bus, sizeof(vser->bus), TYPE_VIRTIO_SERIAL_BUS,
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index c8da480..5411a7b 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -170,7 +170,7 @@ virtio_gpu_base_device_realize(DeviceState *qdev,
 }
 
 g->virtio_config.num_scanouts = cpu_to_le32(g->conf.max_outputs);
-virtio_init(VIRTIO_DEVICE(g), "virtio-gpu", VIRTIO_ID_GPU,
+virtio_init(VIRTIO_DEVICE(g), VIRTIO_ID_GPU,
 sizeof(struct virtio_gpu_config));
 
 if (virtio_gpu_virgl_enabled(g->conf)) {
diff --git a/hw/input/virtio-input.c b/hw/input/virtio-input.c
index 54bcb46..5b5398b 100644
--- a/hw/input/virtio-input.c
+++ b/hw/input/virtio-input.c
@@ -257,8 +257,7 @@ static void virtio_input_device_realize(DeviceState *dev, 
Error **errp)
 vinput->cfg_size += 8;
 assert(vinput->cfg_size <= sizeof(virtio_input_config));
 
-virtio_init(vdev, "virtio-input", VIRTIO_ID_INPUT,
-vinput->cfg_size);
+virtio_init(vdev, VIRTIO_ID_INPUT, vinput->cfg_size);
 vinput->evt

[PATCH v8 7/8] qmp: add QMP command x-debug-virtio-queue-element

2021-10-27 Thread Jonah Palmer

From: Laurent Vivier 

This new command shows the information of a VirtQueue element.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-stub.c |   9 +++
 hw/virtio/virtio.c  | 154 
 qapi/virtio.json| 204 
 3 files changed, 367 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index 387803d..6c282b3 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -31,3 +31,12 @@ VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char 
*path,
 {
 return qmp_virtio_unsupported(errp);
 }
+
+VirtioQueueElement *qmp_x_debug_virtio_queue_element(const char *path,
+ uint16_t queue,
+ bool has_index,
+ uint16_t index,
+ Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7fd98c5..8c8a987 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -480,6 +480,19 @@ static inline void vring_used_write(VirtQueue *vq, 
VRingUsedElem *uelem,
 address_space_cache_invalidate(&caches->used, pa, sizeof(VRingUsedElem));
 }
 
+/* Called within rcu_read_lock(). */
+static inline uint16_t vring_used_flags(VirtQueue *vq)
+{
+VRingMemoryRegionCaches *caches = vring_get_region_caches(vq);
+hwaddr pa = offsetof(VRingUsed, flags);
+
+if (!caches) {
+return 0;
+}
+
+return virtio_lduw_phys_cached(vq->vdev, &caches->used, pa);
+}
+
 /* Called within rcu_read_lock().  */
 static uint16_t vring_used_idx(VirtQueue *vq)
 {
@@ -4387,6 +4400,147 @@ VirtQueueStatus *qmp_x_debug_virtio_queue_status(const 
char *path,
 return status;
 }
 
+static VirtioRingDescFlagsList *qmp_decode_vring_desc_flags(uint16_t flags)
+{
+VirtioRingDescFlagsList *list = NULL;
+VirtioRingDescFlagsList *node;
+int i;
+
+struct {
+uint16_t flag;
+VirtioRingDescFlags value;
+} map[] = {
+{ VRING_DESC_F_NEXT, VIRTIO_RING_DESC_FLAGS_NEXT },
+{ VRING_DESC_F_WRITE, VIRTIO_RING_DESC_FLAGS_WRITE },
+{ VRING_DESC_F_INDIRECT, VIRTIO_RING_DESC_FLAGS_INDIRECT },
+{ 1 << VRING_PACKED_DESC_F_AVAIL, VIRTIO_RING_DESC_FLAGS_AVAIL },
+{ 1 << VRING_PACKED_DESC_F_USED, VIRTIO_RING_DESC_FLAGS_USED },
+{ 0, -1 }
+};
+
+for (i = 0; map[i].flag; i++) {
+if ((map[i].flag & flags) == 0) {
+continue;
+}
+node = g_malloc0(sizeof(VirtioRingDescFlagsList));
+node->value = map[i].value;
+node->next = list;
+list = node;
+}
+
+return list;
+}
+
+VirtioQueueElement *qmp_x_debug_virtio_queue_element(const char *path,
+ uint16_t queue,
+ bool has_index,
+ uint16_t index,
+ Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueue *vq;
+VirtioQueueElement *element = NULL;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIO device", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+vq = &vdev->vq[queue];
+
+if (virtio_vdev_has_feature(vdev, VIRTIO_F_RING_PACKED)) {
+error_setg(errp, "Packed ring not supported");
+return NULL;
+} else {
+unsigned int head, i, max;
+VRingMemoryRegionCaches *caches;
+MemoryRegionCache indirect_desc_cache = MEMORY_REGION_CACHE_INVALID;
+MemoryRegionCache *desc_cache;
+VRingDesc desc;
+VirtioRingDescList *list = NULL;
+VirtioRingDescList *node;
+int rc;
+
+RCU_READ_LOCK_GUARD();
+
+max = vq->vring.num;
+
+if (!has_index) {
+head = vring_avail_ring(vq, vq->last_avail_idx % vq->vring.num);
+} else {
+head = vring_avail_ring(vq, index % vq->vring.num);
+}
+i = head;
+
+caches = vring_get_region_caches(vq);
+if (!caches) {
+error_setg(errp, "Region caches not initialized");
+return NULL;
+}
+if (caches->desc.len < max * sizeof(VRingDesc)) {
+error_setg(errp, "Cannot map descriptor ring");
+return NULL;
+}
+
+desc_cache = &caches->desc;
+vring_split_desc_read(vdev, &desc, desc_cache, i);
+if (desc.flags & VRING_DESC_F_INDIRECT) {
+int64_t len;
+len = address_space_cache_init(&indirect_desc_cache, vdev->dma_as,
+   desc.

[PATCH v8 0/8] hmp,qmp: Add commands to introspect virtio devices

2021-10-27 Thread Jonah Palmer

This series introduces new QMP/HMP commands to dump the status of a
virtio device at different levels.

[Jonah: Rebasing previous patchset from Oct. 5 (v7). Original patches
 are from Laurent Vivier from May 2020.

 Rebase from v7 to v8 includes an additional assert to make sure
 we're not returning NULL in virtio_id_to_name(). Rebase also
 includes minor additions/edits to qapi/virtio.json.]

1. Main command

HMP Only:

virtio [subcommand]

Example:

List all sub-commands:

(qemu) virtio
virtio query  -- List all available virtio devices
virtio status path -- Display status of a given virtio device
virtio queue-status path queue -- Display status of a given virtio queue
virtio vhost-queue-status path queue -- Display status of a given vhost 
queue
virtio queue-element path queue [index] -- Display element of a given 
virtio queue

2. List available virtio devices in the machine

HMP Form:

virtio query

Example:

(qemu) virtio query
/machine/peripheral/vsock0/virtio-backend [vhost-vsock]
/machine/peripheral/crypto0/virtio-backend [virtio-crypto]
/machine/peripheral-anon/device[2]/virtio-backend [virtio-scsi]
/machine/peripheral-anon/device[1]/virtio-backend [virtio-net]
/machine/peripheral-anon/device[0]/virtio-backend [virtio-serial]

QMP Form:

{ 'command': 'x-debug-query-virtio', 'returns': ['VirtioInfo'] }

Example:

-> { "execute": "x-debug-query-virtio" }
<- { "return": [
{
"path": "/machine/peripheral/vsock0/virtio-backend",
"type": "vhost-vsock"
},
{
"path": "/machine/peripheral/crypto0/virtio-backend",
"type": "virtio-crypto"
},
{
"path": "/machine/peripheral-anon/device[2]/virtio-backend",
"type": "virtio-scsi"
},
{
"path": "/machine/peripheral-anon/device[1]/virtio-backend",
"type": "virtio-net"
},
{
"path": "/machine/peripheral-anon/device[0]/virtio-backend",
"type": "virtio-serial"
}
 ]
   }

3. Display status of a given virtio device

HMP Form:

virtio status 

Example:

(qemu) virtio status /machine/peripheral/vsock0/virtio-backend
/machine/peripheral/vsock0/virtio-backend:
device_name: vhost-vsock (vhost)
device_id:   19
vhost_started:   true
bus_name:(null)
broken:  false
disabled:false
disable_legacy_check:false
started: true
use_started: true
start_on_kick:   false
use_guest_notifier_mask: true
vm_running:  true
num_vqs: 3
queue_sel:   2
isr: 0
endianness:  little
status: acknowledge, driver, features-ok, driver-ok
Guest features:   event-idx, indirect-desc, version-1
Host features:protocol-features, event-idx, indirect-desc, 
version-1, any-layout,
  notify-on-empty
Backend features: 
VHost:
nvqs:   2
vq_index:   0
max_queues: 0
n_mem_sections: 4
n_tmp_sections: 4
backend_cap:0
log_enabled:false
log_size:   0
Features:  event-idx, indirect-desc, version-1, 
any-layout, notify-on-empty,
   log-all
Acked features:event-idx, indirect-desc, version-1
Backend features:  
Protocol features:

QMP Form:

{ 'command': 'x-debug-virtio-status',
  'data': { 'path': 'str' },
  'returns': 'VirtioStatus'
}

Example:

-> { "execute": "x-debug-virtio-status",
 "arguments": {
"path": "/machine/peripheral/vsock0/virtio-backend"
 }
   }
<- { "return": {
"device-endian": "little",
"bus-name": "",
"disable-legacy-check": false,
"name": "vhost-vsock",
"started": true,
"device-id": 19,
"vhost-dev": {
"n-tmp-sections": 4,
"n-mem-sections": 4,
"max-queues": 0,
"backend-cap": 0,
"log-size": 0,
"backend-features": {

[PATCH v8 2/8] virtio: add vhost support for virtio devices

2021-10-27 Thread Jonah Palmer

This patch adds a get_vhost() callback function for VirtIODevices that
returns the device's corresponding vhost_dev structure if the vhost
device is running. This patch also adds a vhost_started flag for VirtIODevices.

Previously, a VirtIODevice wouldn't be able to tell if its corresponding
vhost device was active or not.

Signed-off-by: Jonah Palmer 
---
 hw/block/vhost-user-blk.c  |  7 +++
 hw/display/vhost-user-gpu.c|  7 +++
 hw/input/vhost-user-input.c|  7 +++
 hw/net/virtio-net.c|  9 +
 hw/scsi/vhost-scsi.c   |  8 
 hw/virtio/vhost-user-fs.c  |  7 +++
 hw/virtio/vhost-user-rng.c |  7 +++
 hw/virtio/vhost-vsock-common.c |  7 +++
 hw/virtio/vhost.c  |  3 +++
 hw/virtio/virtio-crypto.c  | 10 ++
 hw/virtio/virtio.c |  1 +
 include/hw/virtio/virtio.h |  3 +++
 12 files changed, 76 insertions(+)

diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c
index f61f8c1..b059da1 100644
--- a/hw/block/vhost-user-blk.c
+++ b/hw/block/vhost-user-blk.c
@@ -568,6 +568,12 @@ static void vhost_user_blk_instance_init(Object *obj)
   "/disk@0,0", DEVICE(obj));
 }
 
+static struct vhost_dev *vhost_user_blk_get_vhost(VirtIODevice *vdev)
+{
+VHostUserBlk *s = VHOST_USER_BLK(vdev);
+return &s->dev;
+}
+
 static const VMStateDescription vmstate_vhost_user_blk = {
 .name = "vhost-user-blk",
 .minimum_version_id = 1,
@@ -602,6 +608,7 @@ static void vhost_user_blk_class_init(ObjectClass *klass, 
void *data)
 vdc->get_features = vhost_user_blk_get_features;
 vdc->set_status = vhost_user_blk_set_status;
 vdc->reset = vhost_user_blk_reset;
+vdc->get_vhost = vhost_user_blk_get_vhost;
 }
 
 static const TypeInfo vhost_user_blk_info = {
diff --git a/hw/display/vhost-user-gpu.c b/hw/display/vhost-user-gpu.c
index 49df56c..6e93b46 100644
--- a/hw/display/vhost-user-gpu.c
+++ b/hw/display/vhost-user-gpu.c
@@ -565,6 +565,12 @@ vhost_user_gpu_device_realize(DeviceState *qdev, Error 
**errp)
 g->vhost_gpu_fd = -1;
 }
 
+static struct vhost_dev *vhost_user_gpu_get_vhost(VirtIODevice *vdev)
+{
+VhostUserGPU *g = VHOST_USER_GPU(vdev);
+return &g->vhost->dev;
+}
+
 static Property vhost_user_gpu_properties[] = {
 VIRTIO_GPU_BASE_PROPERTIES(VhostUserGPU, parent_obj.conf),
 DEFINE_PROP_END_OF_LIST(),
@@ -586,6 +592,7 @@ vhost_user_gpu_class_init(ObjectClass *klass, void *data)
 vdc->guest_notifier_pending = vhost_user_gpu_guest_notifier_pending;
 vdc->get_config = vhost_user_gpu_get_config;
 vdc->set_config = vhost_user_gpu_set_config;
+vdc->get_vhost = vhost_user_gpu_get_vhost;
 
 device_class_set_props(dc, vhost_user_gpu_properties);
 }
diff --git a/hw/input/vhost-user-input.c b/hw/input/vhost-user-input.c
index 273e96a..43d2ff3 100644
--- a/hw/input/vhost-user-input.c
+++ b/hw/input/vhost-user-input.c
@@ -79,6 +79,12 @@ static void vhost_input_set_config(VirtIODevice *vdev,
 virtio_notify_config(vdev);
 }
 
+static struct vhost_dev *vhost_input_get_vhost(VirtIODevice *vdev)
+{
+VHostUserInput *vhi = VHOST_USER_INPUT(vdev);
+return &vhi->vhost->dev;
+}
+
 static const VMStateDescription vmstate_vhost_input = {
 .name = "vhost-user-input",
 .unmigratable = 1,
@@ -93,6 +99,7 @@ static void vhost_input_class_init(ObjectClass *klass, void 
*data)
 dc->vmsd = &vmstate_vhost_input;
 vdc->get_config = vhost_input_get_config;
 vdc->set_config = vhost_input_set_config;
+vdc->get_vhost = vhost_input_get_vhost;
 vic->realize = vhost_input_realize;
 vic->change_active = vhost_input_change_active;
 }
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b275acf..2449b9c 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3610,6 +3610,14 @@ static bool dev_unplug_pending(void *opaque)
 return vdc->primary_unplug_pending(dev);
 }
 
+static struct vhost_dev *virtio_net_get_vhost(VirtIODevice *vdev)
+{
+VirtIONet *n = VIRTIO_NET(vdev);
+NetClientState *nc = qemu_get_queue(n->nic);
+struct vhost_net *net = get_vhost_net(nc->peer);
+return &net->dev;
+}
+
 static const VMStateDescription vmstate_virtio_net = {
 .name = "virtio-net",
 .minimum_version_id = VIRTIO_NET_VM_VERSION,
@@ -3712,6 +3720,7 @@ static void virtio_net_class_init(ObjectClass *klass, 
void *data)
 vdc->post_load = virtio_net_post_load_virtio;
 vdc->vmsd = &vmstate_virtio_net_device;
 vdc->primary_unplug_pending = primary_unplug_pending;
+vdc->get_vhost = virtio_net_get_vhost;
 }
 
 static const TypeInfo virtio_net_info = {
diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 039caf2..b0a9c45 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -264,6 +264,13 @@ static void vhost_scsi_unrealize(DeviceState *dev)
 virtio_scsi_common_unrealize(dev);
 }
 
+static struct vhost_dev *vhost_scsi_get_vhost(VirtIODevice *vdev)
+{
+VHostSCSI

[PATCH v8 6/8] qmp: add QMP commands for virtio/vhost queue-status

2021-10-27 Thread Jonah Palmer

From: Laurent Vivier 

These new commands show the internal status of a VirtIODevice's
VirtQueue and a vhost device's vhost_virtqueue (if active).

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-stub.c |  14 +++
 hw/virtio/virtio.c  | 103 +++
 qapi/virtio.json| 268 
 3 files changed, 385 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index ddb592f..387803d 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -17,3 +17,17 @@ VirtioStatus *qmp_x_debug_virtio_status(const char* path, 
Error **errp)
 {
 return qmp_virtio_unsupported(errp);
 }
+
+VirtVhostQueueStatus *qmp_x_debug_virtio_vhost_queue_status(const char *path,
+uint16_t queue,
+Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue,
+ Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5bac549..7fd98c5 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -4284,6 +4284,109 @@ VirtioStatus *qmp_x_debug_virtio_status(const char 
*path, Error **errp)
 return status;
 }
 
+VirtVhostQueueStatus *qmp_x_debug_virtio_vhost_queue_status(const char *path,
+uint16_t queue,
+Error **errp)
+{
+VirtIODevice *vdev;
+VirtVhostQueueStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIODevice", path);
+return NULL;
+}
+
+if (!vdev->vhost_started) {
+error_setg(errp, "Error: vhost device has not started yet");
+return NULL;
+}
+
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+struct vhost_dev *hdev = vdc->get_vhost(vdev);
+
+if (queue < hdev->vq_index || queue >= hdev->vq_index + hdev->nvqs) {
+error_setg(errp, "Invalid vhost virtqueue number %d", queue);
+return NULL;
+}
+
+status = g_new0(VirtVhostQueueStatus, 1);
+status->device_name = g_strdup(vdev->name);
+status->kick = hdev->vqs[queue].kick;
+status->call = hdev->vqs[queue].call;
+status->desc = (uint64_t)(unsigned long)hdev->vqs[queue].desc;
+status->avail = (uint64_t)(unsigned long)hdev->vqs[queue].avail;
+status->used = (uint64_t)(unsigned long)hdev->vqs[queue].used;
+status->num = hdev->vqs[queue].num;
+status->desc_phys = hdev->vqs[queue].desc_phys;
+status->desc_size = hdev->vqs[queue].desc_size;
+status->avail_phys = hdev->vqs[queue].avail_phys;
+status->avail_size = hdev->vqs[queue].avail_size;
+status->used_phys = hdev->vqs[queue].used_phys;
+status->used_size = hdev->vqs[queue].used_size;
+
+return status;
+}
+
+VirtQueueStatus *qmp_x_debug_virtio_queue_status(const char *path,
+ uint16_t queue,
+ Error **errp)
+{
+VirtIODevice *vdev;
+VirtQueueStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIODevice", path);
+return NULL;
+}
+
+if (queue >= VIRTIO_QUEUE_MAX || !virtio_queue_get_num(vdev, queue)) {
+error_setg(errp, "Invalid virtqueue number %d", queue);
+return NULL;
+}
+
+status = g_new0(VirtQueueStatus, 1);
+status->device_name = g_strdup(vdev->name);
+status->queue_index = vdev->vq[queue].queue_index;
+status->inuse = vdev->vq[queue].inuse;
+status->vring_num = vdev->vq[queue].vring.num;
+status->vring_num_default = vdev->vq[queue].vring.num_default;
+status->vring_align = vdev->vq[queue].vring.align;
+status->vring_desc = vdev->vq[queue].vring.desc;
+status->vring_avail = vdev->vq[queue].vring.avail;
+status->vring_used = vdev->vq[queue].vring.used;
+status->used_idx = vdev->vq[queue].used_idx;
+status->signalled_used = vdev->vq[queue].signalled_used;
+status->signalled_used_valid = vdev->vq[queue].signalled_used_valid;
+
+if (vdev->vhost_started) {
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+struct vhost_dev *hdev = vdc->get_vhost(vdev);
+
+/* check if vq index exists for vhost as well  */
+if (queue >= hdev->vq_index && queue < hdev->vq_index + hdev->nvqs) {
+status->has_last_avail_idx = true;
+
+int vhost_vq_index =
+hdev->vhost_ops->vhost_get_vq_index(hdev, queue);
+struct vhost_vring_state state = {
+.index = vhost_vq_index,
+};
+
+stat

Re: [PATCH 1/1] gitlab-ci: Only push docker images to registry from /master branch

2021-10-27 Thread Daniel P . Berrangé

On Wed, Oct 27, 2021 at 01:35:25PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > This scenario a tricky problem, and I'm not seeing an easy answer to it
> > so far.
> 
> Only exclude stable branches?

The stable branches issue is just one of the bugs - the bigger bug is
that we're publishing containers in the qemu-project registry from
the 'staging' branch pipelines, even if the pipeline ultimately
fails.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v8 3/8] qmp: add QMP command x-debug-query-virtio

2021-10-27 Thread Jonah Palmer

From: Laurent Vivier 

This new command lists all the instances of VirtIODevice with
their QOM paths and virtio type/name.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/meson.build  |  2 ++
 hw/virtio/virtio-stub.c| 14 ++
 hw/virtio/virtio.c | 27 +++
 include/hw/virtio/virtio.h |  1 +
 qapi/meson.build   |  1 +
 qapi/qapi-schema.json  |  1 +
 qapi/virtio.json   | 67 ++
 tests/qtest/qmp-cmd-test.c |  1 +
 8 files changed, 114 insertions(+)
 create mode 100644 hw/virtio/virtio-stub.c
 create mode 100644 qapi/virtio.json

diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 521f7d6..d893f5f 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -6,8 +6,10 @@ softmmu_virtio_ss.add(when: 'CONFIG_VHOST', if_false: 
files('vhost-stub.c'))
 
 softmmu_ss.add_all(when: 'CONFIG_VIRTIO', if_true: softmmu_virtio_ss)
 softmmu_ss.add(when: 'CONFIG_VIRTIO', if_false: files('vhost-stub.c'))
+softmmu_ss.add(when: 'CONFIG_VIRTIO', if_false: files('virtio-stub.c'))
 
 softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('vhost-stub.c'))
+softmmu_ss.add(when: 'CONFIG_ALL', if_true: files('virtio-stub.c'))
 
 virtio_ss = ss.source_set()
 virtio_ss.add(files('virtio.c'))
diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
new file mode 100644
index 000..d4a88f5
--- /dev/null
+++ b/hw/virtio/virtio-stub.c
@@ -0,0 +1,14 @@
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qapi-commands-virtio.h"
+
+static void *qmp_virtio_unsupported(Error **errp)
+{
+error_setg(errp, "Virtio is disabled");
+return NULL;
+}
+
+VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 7050bd5..ad17be7 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -13,6 +13,8 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qapi-commands-virtio.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "cpu.h"
 #include "trace.h"
 #include "qemu/error-report.h"
@@ -29,6 +31,9 @@
 #include "sysemu/runstate.h"
 #include "standard-headers/linux/virtio_ids.h"
 
+/* QAPI list of VirtIODevices */
+static QTAILQ_HEAD(, VirtIODevice) virtio_list;
+
 /*
  * The alignment to use between consumer and producer parts of vring.
  * x86 pagesize again. This is the default, used by transports like PCI
@@ -3709,6 +3714,7 @@ static void virtio_device_realize(DeviceState *dev, Error 
**errp)
 vdev->listener.commit = virtio_memory_listener_commit;
 vdev->listener.name = "virtio";
 memory_listener_register(&vdev->listener, vdev->dma_as);
+QTAILQ_INSERT_TAIL(&virtio_list, vdev, next);
 }
 
 static void virtio_device_unrealize(DeviceState *dev)
@@ -3723,6 +3729,7 @@ static void virtio_device_unrealize(DeviceState *dev)
 vdc->unrealize(dev);
 }
 
+QTAILQ_REMOVE(&virtio_list, vdev, next);
 g_free(vdev->bus_name);
 vdev->bus_name = NULL;
 }
@@ -3896,6 +3903,8 @@ static void virtio_device_class_init(ObjectClass *klass, 
void *data)
 vdc->stop_ioeventfd = virtio_device_stop_ioeventfd_impl;
 
 vdc->legacy_features |= VIRTIO_LEGACY_FEATURES;
+
+QTAILQ_INIT(&virtio_list);
 }
 
 bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
@@ -3906,6 +3915,24 @@ bool virtio_device_ioeventfd_enabled(VirtIODevice *vdev)
 return virtio_bus_ioeventfd_enabled(vbus);
 }
 
+VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
+{
+VirtioInfoList *list = NULL;
+VirtioInfoList *node;
+VirtIODevice *vdev;
+
+QTAILQ_FOREACH(vdev, &virtio_list, next) {
+DeviceState *dev = DEVICE(vdev);
+node = g_new0(VirtioInfoList, 1);
+node->value = g_new(VirtioInfo, 1);
+node->value->path = g_strdup(dev->canonical_path);
+node->value->type = g_strdup(vdev->name);
+QAPI_LIST_PREPEND(list, node->value);
+}
+
+return list;
+}
+
 static const TypeInfo virtio_device_info = {
 .name = TYPE_VIRTIO_DEVICE,
 .parent = TYPE_DEVICE,
diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index 105b98c..eceaafc 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -110,6 +110,7 @@ struct VirtIODevice
 bool use_guest_notifier_mask;
 AddressSpace *dma_as;
 QLIST_HEAD(, VirtQueue) *vector_queues;
+QTAILQ_ENTRY(VirtIODevice) next;
 };
 
 struct VirtioDeviceClass {
diff --git a/qapi/meson.build b/qapi/meson.build
index c356a38..df5662e 100644
--- a/qapi/meson.build
+++ b/qapi/meson.build
@@ -45,6 +45,7 @@ qapi_all_modules = [
   'sockets',
   'trace',
   'transaction',
+  'virtio',
   'yank',
 ]
 if have_system
diff --git a/qapi/qapi-schema.json b/qapi/qapi-schema.json
index 4912b97..1512ada 100644
--- a/qapi/qapi-schema.json
+++ b/qapi/qapi-schema.json
@@ -93,3 +93,4 @@
 { 'include': 'audio.json' }
 { 'include': 'acpi.json' }
 { 'include': 'pci.json' }
+{ 'include': 'virt

Re: [PATCH 0/1] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Daniel P . Berrangé

On Tue, Oct 26, 2021 at 04:55:08PM +0200, Philippe Mathieu-Daudé wrote:
> Hi,
> 
> I guess I got very unlucky because I happened to pull the docker
> images from the mainstream registry in a very short time frame
> where they jumped back in time... Then I kept using them to run
> my tests during 1 week trying to understand why I was having odd
> build failures. Then I realize the Ubuntu docker images were out
> of sync. I pulled again and it was working, so I searched for the
> mainstream job producing the outdated images and found a pipeline
> pushing the 'stable-6.0-staging' branch. This branch doesn't
> contain the recent gitlab-ci and Dockerfile changes...
> 
> Similarly, this branch doesn't contain commit eafadbbbac0
> ("gitlab: only let pages be published from default branch") so
> outdated documentation got pushed for a short time.
> 
> This patch won't fix branches pushed from the past, but at least
> it should avoid to reproduce this problem in the future.
> 
> Any idea how to improve the GitLab infrastructure to avoid these
> kind of problems in the future? Is it possible to enforce
> restrictions from the project configuration, rather than the
> repository YAML file?

The problem here is the general way we are handling docker images
in the CI pipeline.

In the first stage of the pipeline we build and publish images
with a tag   "$reponame/$imagename:latest".  We need to publish
them because the next stage of the pipeline has to be able to
pull the just built images - stages can't directly inherit the
docker images from each stages without going via the registry.

The use of ":latest" here is the root cause of our problems.
It works only under the assumptions:

  - you have a single pipeline running for any repository at
any point in time
  - you only need a single maintainance stage (master)

The first assumption was never really true. In the main qemu-project
namespace we sometimes end up ith pipelines runnning on 'master' for
just pushed series, and for 'staging' for series being tested for
merge. Most of the time we get away with this but we've seen a few
rare CI failures from it. Similarly in user forks most of the time
we get away with it, but sometimes users might quickly push to
different branches. In all cases the race is only a problem when
a patch series contains dockerfile changes, which is why it is such
a rare problem

So far we've only been relying on the CI pipeline for pushes to master,
since stable branch maint has been awol for a while. We're soon going
to starting violating this assumption though.

IOW, we can't keep relying on ":latest", as its going to cause increasing
trouble.

The reason we picked ":latest" though is because gitlab CI config is a
bit inflexible:

  - It has to be either a fixed string, or the contents of a standard
environment variable present in gitlab:

https://docs.gitlab.com/ee/ci/variables/predefined_variables.html

We can't populate our own env variables dynamically, nor
programatically define a tag

  - The tag contents has to satisfy the Docker restrictions on valid
characters in tags

  "A tag name must be valid ASCII and may contain lowercase and 
   uppercase letters, digits, underscores, periods and dashes. 
   A tag name may not start with a period or a dash and may 
   contain a maximum of 128 characters."

Essentially we need two tags

 - One that is only for use during execution of the current pipeline
 - One that is published only on success of the pipeline on master,
   to serve as cache for future pipelines, or to let developer pull

Notably the latter is more restrictive that git branch names. We could
assume users always have "sensible" branch names that are less than
128 chars and only alpha-num characters plus dash/underscore. This
would be fine for my personal branch naming, but I wonder if anyone
uses wierd branch names that would violate docker tag name rules ?
Perhaps we just accept that risk and have the CI job that builds the
container fail + print a clear message to user to rename their branch
to something sensible.

Ideally we would take a sha256 sum of the dockerfile (and parent layers
it inherits from) and use that as the image tag. That fails with the
constraint number 1 though - we can't programmatically set tags. This
would maximise our cache hit rate though, compared to the single
"latest" tag we publish.

The other alternative option here is to use the current git commit
hash to as the image tag, since this is valid docker tag name and is
unique for the dockerfile contents even if multiple pipelines run for
the same commit.

An even more unique option is to use the unique pipeline ID as the
tag.

Using either commit hash or pipeline ID will lead to an explosion in
the number of tags present in the repository's container registry.
GitLab has ability to periodically cleanup old tags but on all my
repos this appears disabled by default.

We could manually delete the tag, at the en

Re: [PATCH v3] i386: docs: Briefly describe KVM PV features

2021-10-27 Thread Vitaly Kuznetsov

Igor Mammedov  writes:

> On Mon,  4 Oct 2021 16:04:45 +0200
> Vitaly Kuznetsov  wrote:
>

Thanks for the review! As I can see, the patch already made it to
'master':

commit 7f7c8d0ce3630849a4df3d627b11de354fcb3bb0
Author: Vitaly Kuznetsov 
Date:   Mon Oct 4 16:04:45 2021 +0200

i386: docs: Briefly describe KVM PV features

we can send follow-ups, of course. 

>> KVM PV features don't seem to be documented anywhere, in particular, the
>> fact that some of the features are enabled by default and some are not can
>> only be figured out from the code.
>> 
>> Signed-off-by: Vitaly Kuznetsov 
>> ---
>> Changes since "[PATCH v2 0/8] i386: Assorted KVM PV and Hyper-V feature
>>  improvements" [Paolo Bonzini]:
>> - Convert to 'rst' and move to docs/system/i386/kvm-pv.rst.
>> - Add information about the version of Linux that introduced the particular
>>   PV feature.
>> ---
>>  docs/system/i386/kvm-pv.rst | 100 
>>  docs/system/target-i386.rst |   1 +
>>  2 files changed, 101 insertions(+)
>>  create mode 100644 docs/system/i386/kvm-pv.rst
>> 
>> diff --git a/docs/system/i386/kvm-pv.rst b/docs/system/i386/kvm-pv.rst
>> new file mode 100644
>> index ..1e5a9923ef45
>> --- /dev/null
>> +++ b/docs/system/i386/kvm-pv.rst
>> @@ -0,0 +1,100 @@
>> +Paravirtualized KVM features
>> +
>> +
>> +Description
>> +---
>> +
>> +In some cases when implementing hardware interfaces in software is slow, 
>> ``KVM``
>> +implements its own paravirtualized interfaces.
>> +
>> +Setup
>> +-
>> +
>> +Paravirtualized ``KVM`` features are represented as CPU flags. The following
>> +features are enabled by default for any CPU model when ``KVM`` acceleration 
>> is
>> +enabled:
>
> /if host kernel supports them
>

It does as QEMU requires linux >= 4.5. I'm not sure what happens if it
doesn't, maybe it won't start. 

>> +
>> +- ``kvmclock``
>> +- ``kvm-nopiodelay``
>
>> +- ``kvm-asyncpf``
>
> later you say it's not enabled by default since x.y and something else
> should be used instead

The situation is a bit weird. QEMU will still be enabling kvm-asyncpf by
default. This, however, has no effect currently as KVM dropped support
for this feature (in favor of kvm-asyncpf-int but this one is *not*
enabled by default)

>
> maybe add a kernel version for each item in this list aka: (since: ... 
> [,till])
>
>> +- ``kvm-steal-time``
>> +- ``kvm-pv-eoi``
>> +- ``kvmclock-stable-bit``
>> +
>> +``kvm-msi-ext-dest-id`` feature is enabled by default in x2apic mode with 
>> split
>> +irqchip (e.g. "-machine ...,kernel-irqchip=split -cpu ...,x2apic").
>
>
>> +Note: when CPU model ``host`` is used, QEMU passes through all supported
>> +paravirtualized ``KVM`` features to the guest.
>
> Is it true in case of kvm-pv-enforce-cpuid=on ?

Yes, I believe these two things are orthogonal: CPU model 'host' will
give you everything supported by the kernel, 'kvm-pv-enforce-cpuid' will
tell KVM to forbid using features, not exposed in guest visible CPUIDs:
but combined with 'host' this is going to be an empty set as all
features are enabled.

>
> Also I'd s/passes through/enables/
> on the grounds that host CPUID simply doesn't have such CPUIDs
> so it's a bit confusing.

I meant to say 'passes through' from KVM, not from pCPU but I see why
this is not clear.

>
>
>> +Existing features
>> +-
>> +
>> +``kvmclock``
>> +  Expose a ``KVM`` specific paravirtualized clocksource to the guest. 
>> Supported
>> +  since Linux v2.6.26.
>> +
>> +``kvm-nopiodelay``
>> +  The guest doesn't need to perform delays on PIO operations. Supported 
>> since
>> +  Linux v2.6.26.
>> +
>> +``kvm-mmu``
>> +  This feature is deprecated.
>> +
>> +``kvm-asyncpf``
>> +  Enable asynchronous page fault mechanism. Supported since Linux v2.6.38.
>> +  Note: since Linux v5.10 the feature is deprecated and not enabled by 
>> ``KVM``.
>
>> +  Use ``kvm-asyncpf-int`` instead.
> 'Use' or 'Used' by default?
>

'kvm-asyncpf' is a dead feature now so in case users want to get
Asynchronouse Page Faults they need to enable 'kvm-asyncpf-int'
manually, thus 'use'.

>
>> +``kvm-steal-time``
>> +  Enable stolen (when guest vCPU is not running) time accounting. Supported
>> +  since Linux v3.1.
>> +
>> +``kvm-pv-eoi``
>> +  Enable paravirtualized end-of-interrupt signaling. Supported since Linux
>> +  v3.10.
>> +
>> +``kvm-pv-unhalt``
>> +  Enable paravirtualized spinlocks support. Supported since Linux v3.12.
>> +
>> +``kvm-pv-tlb-flush``
>> +  Enable paravirtualized TLB flush mechanism. Supported since Linux v4.16.
>> +
>> +``kvm-pv-ipi``
>> +  Enable paravirtualized IPI mechanism. Supported since Linux v4.19.
>> +
>> +``kvm-poll-control``
>> +  Enable host-side polling on HLT control from the guest. Supported since 
>> Linux
>> +  v5.10.
>> +
>> +``kvm-pv-sched-yield``
>> +  Enable paravirtualized sched yield feature. Supported since Linux v5.10.
>> +
>> +``kvm-asyncpf-int``
>> +  Enable interrupt based asy

Re: [PATCH v8 0/8] hmp, qmp: Add commands to introspect virtio devices

2021-10-27 Thread Daniel P . Berrangé

On Wed, Oct 27, 2021 at 07:41:41AM -0400, Jonah Palmer wrote:
> This series introduces new QMP/HMP commands to dump the status of a
> virtio device at different levels.
> 
> [Jonah: Rebasing previous patchset from Oct. 5 (v7). Original patches
>  are from Laurent Vivier from May 2020.
> 
>  Rebase from v7 to v8 includes an additional assert to make sure
>  we're not returning NULL in virtio_id_to_name(). Rebase also
>  includes minor additions/edits to qapi/virtio.json.]
> 
> 1. Main command
> 
> HMP Only:
> 
> virtio [subcommand]
> 
> Example:
> 
> List all sub-commands:
> 
> (qemu) virtio
> virtio query  -- List all available virtio devices
> virtio status path -- Display status of a given virtio device
> virtio queue-status path queue -- Display status of a given virtio 
> queue
> virtio vhost-queue-status path queue -- Display status of a given 
> vhost queue
> virtio queue-element path queue [index] -- Display element of a given 
> virtio queue

I don't see a compelling reason why these are setup as sub-commands
under a new "virtio" top level. This HMP approach and the QMP 'x-debug-query'
naming just feels needlessly different from the current QEMU practices.

IMHO they should just be "info" subcommands for HMP. ie

 info virtio  -- List all available virtio devices
 info virtio-status path -- Display status of a given virtio device
 info virtio-queue-status path queue -- Display status of a given 
virtio queue
 info virtio-vhost-queue-status path queue -- Display status of a given 
vhost queue
 info virtio-queue-element path queue [index] -- Display element of a 
given virtio queue

While the corresponding QMP commands ought to be

 x-query-virtio
 x-query-virtio-status
 x-query-virtio-queue-status
 x-query-virtio-vhost-queue-status
 x-query-virtio-queue-element


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v8 4/8] qmp: add QMP command x-debug-virtio-status

2021-10-27 Thread Jonah Palmer

From: Laurent Vivier 

This new command shows the status of a VirtIODevice, including
its corresponding vhost device status (if active).

Next patch will improve output by decoding feature bits, including
vhost device's feature bits (backend, protocol, acked, and features).
Also will decode status bits of a VirtIODevice.

Next patch will also suppress the vhost device field from displaying
if no vhost device is active for a given VirtIODevice.

Signed-off-by: Jonah Palmer 
---
 hw/virtio/virtio-stub.c |   5 +
 hw/virtio/virtio.c  |  96 ++
 qapi/virtio.json| 255 
 3 files changed, 356 insertions(+)

diff --git a/hw/virtio/virtio-stub.c b/hw/virtio/virtio-stub.c
index d4a88f5..ddb592f 100644
--- a/hw/virtio/virtio-stub.c
+++ b/hw/virtio/virtio-stub.c
@@ -12,3 +12,8 @@ VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
 {
 return qmp_virtio_unsupported(errp);
 }
+
+VirtioStatus *qmp_x_debug_virtio_status(const char* path, Error **errp)
+{
+return qmp_virtio_unsupported(errp);
+}
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index ad17be7..8d13d27 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -3933,6 +3933,102 @@ VirtioInfoList *qmp_x_debug_query_virtio(Error **errp)
 return list;
 }
 
+static VirtIODevice *virtio_device_find(const char *path)
+{
+VirtIODevice *vdev;
+
+QTAILQ_FOREACH(vdev, &virtio_list, next) {
+DeviceState *dev = DEVICE(vdev);
+
+if (strcmp(dev->canonical_path, path) != 0) {
+continue;
+}
+return vdev;
+}
+
+return NULL;
+}
+
+VirtioStatus *qmp_x_debug_virtio_status(const char *path, Error **errp)
+{
+VirtIODevice *vdev;
+VirtioStatus *status;
+
+vdev = virtio_device_find(path);
+if (vdev == NULL) {
+error_setg(errp, "Path %s is not a VirtIO device", path);
+return NULL;
+}
+
+status = g_new0(VirtioStatus, 1);
+status->vhost_dev = g_new0(VhostStatus, 1);
+status->name = g_strdup(vdev->name);
+status->device_id = vdev->device_id;
+status->vhost_started = vdev->vhost_started;
+status->guest_features = vdev->guest_features;
+status->host_features = vdev->host_features;
+status->backend_features = vdev->backend_features;
+
+switch (vdev->device_endian) {
+case VIRTIO_DEVICE_ENDIAN_LITTLE:
+status->device_endian = VIRTIO_STATUS_ENDIANNESS_LITTLE;
+break;
+case VIRTIO_DEVICE_ENDIAN_BIG:
+status->device_endian = VIRTIO_STATUS_ENDIANNESS_BIG;
+break;
+default:
+status->device_endian = VIRTIO_STATUS_ENDIANNESS_UNKNOWN;
+break;
+}
+
+status->num_vqs = virtio_get_num_queues(vdev);
+status->status = vdev->status;
+status->isr = vdev->isr;
+status->queue_sel = vdev->queue_sel;
+status->vm_running = vdev->vm_running;
+status->broken = vdev->broken;
+status->disabled = vdev->disabled;
+status->use_started = vdev->use_started;
+status->started = vdev->started;
+status->start_on_kick = vdev->start_on_kick;
+status->disable_legacy_check = vdev->disable_legacy_check;
+status->bus_name = g_strdup(vdev->bus_name);
+status->use_guest_notifier_mask = vdev->use_guest_notifier_mask;
+
+if (vdev->vhost_started) {
+VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
+struct vhost_dev *hdev = vdc->get_vhost(vdev);
+
+status->vhost_dev->n_mem_sections = hdev->n_mem_sections;
+status->vhost_dev->n_tmp_sections = hdev->n_tmp_sections;
+status->vhost_dev->nvqs = hdev->nvqs;
+status->vhost_dev->vq_index = hdev->vq_index;
+status->vhost_dev->features = hdev->features;
+status->vhost_dev->acked_features = hdev->acked_features;
+status->vhost_dev->backend_features = hdev->backend_features;
+status->vhost_dev->protocol_features = hdev->protocol_features;
+status->vhost_dev->max_queues = hdev->max_queues;
+status->vhost_dev->backend_cap = hdev->backend_cap;
+status->vhost_dev->log_enabled = hdev->log_enabled;
+status->vhost_dev->log_size = hdev->log_size;
+} else {
+status->vhost_dev->n_mem_sections = 0;
+status->vhost_dev->n_tmp_sections = 0;
+status->vhost_dev->nvqs = 0;
+status->vhost_dev->vq_index = 0;
+status->vhost_dev->features = 0;
+status->vhost_dev->acked_features = 0;
+status->vhost_dev->backend_features = 0;
+status->vhost_dev->protocol_features = 0;
+status->vhost_dev->max_queues = 0;
+status->vhost_dev->backend_cap = 0;
+status->vhost_dev->log_enabled = false;
+status->vhost_dev->log_size = 0;
+}
+
+return status;
+}
+
 static const TypeInfo virtio_device_info = {
 .name = TYPE_VIRTIO_DEVICE,
 .parent = TYPE_DEVICE,
diff --git a/qapi/virtio.json b/qapi/virtio.json
index 4490c2c..656a26f 100644
--- a/qapi/virtio.json
+++ b/qapi/vi

Re: [PATCH v8 5/8] qmp: decode feature & status bits in virtio-status

2021-10-27 Thread David Hildenbrand

On 27.10.21 13:41, Jonah Palmer wrote:
> From: Laurent Vivier 
> 
> Display feature names instead of bitmaps for host, guest, and
> backend for VirtIODevice.
> 
> Display status names instead of bitmaps for VirtIODevice.
> 
> Display feature names instead of bitmaps for backend, protocol,
> acked, and features (hdev->features) for vhost devices.
> 
> Decode features according to device type. Decode status
> according to configuration status bitmap (config_status_map).
> Decode vhost user protocol features according to vhost user
> protocol bitmap (vhost_user_protocol_map).
> 
> Transport features are on the first line. Undecoded bits
> (if any) are stored in a separate field. Vhost device field
> wont show if there's no vhost active for a given VirtIODevice.
> 
> Signed-off-by: Jonah Palmer 
> ---
>  hw/block/virtio-blk.c  |  28 ++
>  hw/char/virtio-serial-bus.c|  11 +
>  hw/display/virtio-gpu-base.c   |  18 +-
>  hw/input/virtio-input.c|  11 +-
>  hw/net/virtio-net.c|  47 
>  hw/scsi/virtio-scsi.c  |  17 ++
>  hw/virtio/vhost-user-fs.c  |  10 +
>  hw/virtio/vhost-vsock-common.c |  10 +
>  hw/virtio/virtio-balloon.c |  14 +
>  hw/virtio/virtio-crypto.c  |  10 +
>  hw/virtio/virtio-iommu.c   |  14 +
>  hw/virtio/virtio.c | 273 ++-
>  include/hw/virtio/vhost.h  |   3 +
>  include/hw/virtio/virtio.h |  17 ++
>  qapi/virtio.json   | 577 
> ++---

Any particular reason we're not handling virtio-mem?

(there is only a single feature bit so far, a second one to be
introduced soon)


-- 
Thanks,

David / dhildenb

[PATCH v8 5/8] qmp: decode feature & status bits in virtio-status

2021-10-27 Thread Jonah Palmer

From: Laurent Vivier 

Display feature names instead of bitmaps for host, guest, and
backend for VirtIODevice.

Display status names instead of bitmaps for VirtIODevice.

Display feature names instead of bitmaps for backend, protocol,
acked, and features (hdev->features) for vhost devices.

Decode features according to device type. Decode status
according to configuration status bitmap (config_status_map).
Decode vhost user protocol features according to vhost user
protocol bitmap (vhost_user_protocol_map).

Transport features are on the first line. Undecoded bits
(if any) are stored in a separate field. Vhost device field
wont show if there's no vhost active for a given VirtIODevice.

Signed-off-by: Jonah Palmer 
---
 hw/block/virtio-blk.c  |  28 ++
 hw/char/virtio-serial-bus.c|  11 +
 hw/display/virtio-gpu-base.c   |  18 +-
 hw/input/virtio-input.c|  11 +-
 hw/net/virtio-net.c|  47 
 hw/scsi/virtio-scsi.c  |  17 ++
 hw/virtio/vhost-user-fs.c  |  10 +
 hw/virtio/vhost-vsock-common.c |  10 +
 hw/virtio/virtio-balloon.c |  14 +
 hw/virtio/virtio-crypto.c  |  10 +
 hw/virtio/virtio-iommu.c   |  14 +
 hw/virtio/virtio.c | 273 ++-
 include/hw/virtio/vhost.h  |   3 +
 include/hw/virtio/virtio.h |  17 ++
 qapi/virtio.json   | 577 ++---
 15 files changed, 1015 insertions(+), 45 deletions(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 505e574..c2e901f 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -13,6 +13,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "qemu/iov.h"
 #include "qemu/module.h"
 #include "qemu/error-report.h"
@@ -32,6 +33,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "migration/qemu-file-types.h"
 #include "hw/virtio/virtio-access.h"
+#include "standard-headers/linux/vhost_types.h"
 
 /* Config size before the discard support (hide associated config fields) */
 #define VIRTIO_BLK_CFG_SIZE offsetof(struct virtio_blk_config, \
@@ -48,6 +50,32 @@ static const VirtIOFeature feature_sizes[] = {
 {}
 };
 
+qmp_virtio_feature_map_t blk_map[] = {
+#define FEATURE_ENTRY(name) \
+{ VIRTIO_BLK_F_##name, VIRTIO_BLK_FEATURE_##name }
+FEATURE_ENTRY(SIZE_MAX),
+FEATURE_ENTRY(SEG_MAX),
+FEATURE_ENTRY(GEOMETRY),
+FEATURE_ENTRY(RO),
+FEATURE_ENTRY(BLK_SIZE),
+FEATURE_ENTRY(TOPOLOGY),
+FEATURE_ENTRY(MQ),
+FEATURE_ENTRY(DISCARD),
+FEATURE_ENTRY(WRITE_ZEROES),
+#ifndef VIRTIO_BLK_NO_LEGACY
+FEATURE_ENTRY(BARRIER),
+FEATURE_ENTRY(SCSI),
+FEATURE_ENTRY(FLUSH),
+FEATURE_ENTRY(CONFIG_WCE),
+#endif /* !VIRTIO_BLK_NO_LEGACY */
+#undef FEATURE_ENTRY
+#define FEATURE_ENTRY(name) \
+{ VHOST_F_##name, VIRTIO_BLK_FEATURE_##name }
+FEATURE_ENTRY(LOG_ALL),
+#undef FEATURE_ENTRY
+{ -1, -1 }
+};
+
 static void virtio_blk_set_config_size(VirtIOBlock *s, uint64_t host_features)
 {
 s->config_size = MAX(VIRTIO_BLK_CFG_SIZE,
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 232f4c9..fa57059 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 #include "qapi/error.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "qemu/iov.h"
 #include "qemu/main-loop.h"
 #include "qemu/module.h"
@@ -32,6 +33,16 @@
 #include "hw/virtio/virtio-serial.h"
 #include "hw/virtio/virtio-access.h"
 
+qmp_virtio_feature_map_t serial_map[] = {
+#define FEATURE_ENTRY(name) \
+{ VIRTIO_CONSOLE_F_##name, VIRTIO_SERIAL_FEATURE_##name }
+FEATURE_ENTRY(SIZE),
+FEATURE_ENTRY(MULTIPORT),
+FEATURE_ENTRY(EMERG_WRITE),
+#undef FEATURE_ENTRY
+{ -1, -1 }
+};
+
 static struct VirtIOSerialDevices {
 QLIST_HEAD(, VirtIOSerial) devices;
 } vserdevices;
diff --git a/hw/display/virtio-gpu-base.c b/hw/display/virtio-gpu-base.c
index 5411a7b..a322349 100644
--- a/hw/display/virtio-gpu-base.c
+++ b/hw/display/virtio-gpu-base.c
@@ -12,13 +12,29 @@
  */
 
 #include "qemu/osdep.h"
-
+#include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/virtio-gpu.h"
 #include "migration/blocker.h"
 #include "qapi/error.h"
+#include "qapi/qapi-visit-virtio.h"
 #include "qemu/error-report.h"
 #include "trace.h"
 
+qmp_virtio_feature_map_t gpu_map[] = {
+#define FEATURE_ENTRY(name) \
+{ VIRTIO_GPU_F_##name, VIRTIO_GPU_FEATURE_##name }
+FEATURE_ENTRY(VIRGL),
+FEATURE_ENTRY(EDID),
+FEATURE_ENTRY(RESOURCE_UUID),
+FEATURE_ENTRY(RESOURCE_BLOB),
+#undef FEATURE_ENTRY
+#define FEATURE_ENTRY(name) \
+{ VHOST_F_##name, VIRTIO_GPU_FEATURE_##name }
+FEATURE_ENTRY(LOG_ALL),
+#undef FEATURE_ENTRY
+{ -1, -1 }
+};
+
 void
 virtio_gpu_base_reset(VirtIOGPUBase *g)
 {
diff --git a/hw/input/virtio-input.c b/hw/input/virtio-input.c
index 5b5398b..b4562a3 100644
--- a/hw/input/virtio-input.c
+++ b/hw/input/virtio-input.c
@@ -6,6 +6,7 @@
 
 #include "qe

[PATCH v8 8/8] hmp: add virtio commands

2021-10-27 Thread Jonah Palmer

From: Laurent Vivier 

This patch implements the HMP versions of the virtio QMP commands.

Signed-off-by: Jonah Palmer 
---
 docs/system/monitor.rst |   2 +
 hmp-commands-virtio.hx  | 250 ++
 hmp-commands.hx |  10 ++
 hw/virtio/virtio.c  | 355 
 include/monitor/hmp.h   |   5 +
 meson.build |   1 +
 monitor/misc.c  |  17 +++
 7 files changed, 640 insertions(+)
 create mode 100644 hmp-commands-virtio.hx

diff --git a/docs/system/monitor.rst b/docs/system/monitor.rst
index ff5c434..10418fc 100644
--- a/docs/system/monitor.rst
+++ b/docs/system/monitor.rst
@@ -21,6 +21,8 @@ The following commands are available:
 
 .. hxtool-doc:: hmp-commands.hx
 
+.. hxtool-doc:: hmp-commands-virtio.hx
+
 .. hxtool-doc:: hmp-commands-info.hx
 
 Integer expressions
diff --git a/hmp-commands-virtio.hx b/hmp-commands-virtio.hx
new file mode 100644
index 000..36aab94
--- /dev/null
+++ b/hmp-commands-virtio.hx
@@ -0,0 +1,250 @@
+HXCOMM Use DEFHEADING() to define headings in both help text and rST.
+HXCOMM Text between SRST and ERST is copied to the rST version and
+HXCOMM discarded from C version.
+HXCOMM
+HXCOMM DEF(command, args, callback, arg_string, help) is used to construct
+HXCOMM monitor info commands.
+HXCOMM
+HXCOMM HXCOMM can be used for comments, discarded from both rST and C.
+HXCOMM
+HXCOMM In this file, generally SRST fragments should have two extra
+HXCOMM spaces of indent, so that the documentation list item for "virtio cmd"
+HXCOMM appears inside the documentation list item for the top level
+HXCOMM "virtio" documentation entry. The exception is the first SRST
+HXCOMM fragment that defines that top level entry.
+
+SRST
+  ``virtio`` *subcommand*
+  Show various information about virtio
+
+  Example:
+
+  List all sub-commands::
+
+  (qemu) virtio
+  virtio query  -- List all available virtio devices
+  virtio status path -- Display status of a given virtio device
+  virtio queue-status path queue -- Display status of a given virtio queue
+  virtio vhost-queue-status path queue -- Display status of a given vhost queue
+  virtio queue-element path queue [index] -- Display element of a given virtio 
queue
+
+ERST
+
+  {
+.name   = "query",
+.args_type  = "",
+.params = "",
+.help   = "List all available virtio devices",
+.cmd= hmp_virtio_query,
+.flags  = "p",
+  },
+
+SRST
+  ``virtio query``
+  List all available virtio devices
+
+  Example:
+
+  List all available virtio devices in the machine::
+
+  (qemu) virtio query
+  /machine/peripheral/vsock0/virtio-backend [vhost-vsock]
+  /machine/peripheral/crypto0/virtio-backend [virtio-crypto]
+  /machine/peripheral-anon/device[2]/virtio-backend [virtio-scsi]
+  /machine/peripheral-anon/device[1]/virtio-backend [virtio-net]
+  /machine/peripheral-anon/device[0]/virtio-backend [virtio-serial]
+
+ERST
+
+  {
+.name   = "status",
+.args_type  = "path:s",
+.params = "path",
+.help   = "Display status of a given virtio device",
+.cmd= hmp_virtio_status,
+.flags  = "p",
+  },
+
+SRST
+  ``virtio status`` *path*
+  Display status of a given virtio device
+
+  Example:
+
+  Dump the status of virtio-net (vhost on)::
+
+  (qemu) virtio status /machine/peripheral-anon/device[1]/virtio-backend
+  /machine/peripheral-anon/device[1]/virtio-backend:
+device_name: virtio-net (vhost)
+device_id:   1
+vhost_started:   true
+bus_name:(null)
+broken:  false
+disabled:false
+disable_legacy_check:false
+started: true
+use_started: true
+start_on_kick:   false
+use_guest_notifier_mask: true
+vm_running:  true
+num_vqs: 3
+queue_sel:   2
+isr: 1
+endianness:  little
+status: acknowledge, driver, features-ok, driver-ok
+Guest features:   event-idx, indirect-desc, version-1
+  ctrl-mac-addr, guest-announce, ctrl-vlan, ctrl-rx, 
ctrl-vq, status, mrg-rxbuf,
+  host-ufo, host-ecn, host-tso6, host-tso4, guest-ufo, 
guest-ecn, guest-tso6,
+  guest-tso4, mac, ctrl-guest-offloads, guest-csum, csum
+Host features:protocol-features, event-idx, indirect-desc, version-1, 
any-layout, notify-on-empty
+  gso, ctrl-mac-addr, guest-announce, ctrl-rx-extra, 
ctrl-vlan, ctrl-rx, ctrl-vq,
+  status, mrg-rxbuf, host-ufo, host-ecn, host-tso6, 
host-tso4, guest-ufo, guest-ecn,
+  guest-tso6, guest-tso4, mac, ctrl-guest-offloads, 
guest-csum, csum
+Backend features: protocol-features, event-idx, indirect-desc, version-1, 
any-layout, notify-on-empty
+  gso, ctrl-mac-addr, guest-announce, ctrl-rx-ex

Re: [PATCH v2 2/3] memory: Make memory_region_is_mapped() succeed when mapped via an alias

2021-10-27 Thread David Hildenbrand

On 26.10.21 19:00, Philippe Mathieu-Daudé wrote:
> On 10/26/21 18:06, David Hildenbrand wrote:
>> memory_region_is_mapped() currently does not return "true" when a memory
>> region is mapped via an alias.
>>
>> Assuming we have:
>> alias (A0) -> alias (A1) -> region (R0)
>> Mapping A0 would currently only make memory_region_is_mapped() succeed
>> on A0, but not on A1 and R0.
>>
>> Let's fix that by adding a "mapped_via_alias" counter to memory regions and
>> updating it accordingly when an alias gets (un)mapped.
>>
>> I am not aware of actual issues, this is rather a cleanup to make it
>> consistent.
>>
>> Signed-off-by: David Hildenbrand 
>> ---
>>  include/exec/memory.h |  1 +
>>  softmmu/memory.c  | 12 +++-
>>  2 files changed, 12 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/exec/memory.h b/include/exec/memory.h
>> index a185b6dcb8..35382d9870 100644
>> --- a/include/exec/memory.h
>> +++ b/include/exec/memory.h
>> @@ -707,6 +707,7 @@ struct MemoryRegion {
>>  const MemoryRegionOps *ops;
>>  void *opaque;
>>  MemoryRegion *container;
>> +int mapped_via_alias; /* Mapped via an alias, container might be NULL */
>>  Int128 size;
>>  hwaddr addr;
>>  void (*destructor)(MemoryRegion *mr);
>> diff --git a/softmmu/memory.c b/softmmu/memory.c
>> index e5826faa0c..17ca896c38 100644
>> --- a/softmmu/memory.c
>> +++ b/softmmu/memory.c
>> @@ -2524,8 +2524,13 @@ static void 
>> memory_region_add_subregion_common(MemoryRegion *mr,
>> hwaddr offset,
>> MemoryRegion *subregion)
>>  {
>> +MemoryRegion *alias;
>> +
>>  assert(!subregion->container);
>>  subregion->container = mr;
>> +for (alias = subregion->alias; alias; alias = alias->alias) {
>> +alias->mapped_via_alias++;
>> +}
>>  subregion->addr = offset;
>>  memory_region_update_container_subregions(subregion);
>>  }
>> @@ -2550,9 +2555,14 @@ void memory_region_add_subregion_overlap(MemoryRegion 
>> *mr,
>>  void memory_region_del_subregion(MemoryRegion *mr,
>>   MemoryRegion *subregion)
>>  {
>> +MemoryRegion *alias;
>> +
>>  memory_region_transaction_begin();
>>  assert(subregion->container == mr);
>>  subregion->container = NULL;
>> +for (alias = subregion->alias; alias; alias = alias->alias) {
>> +alias->mapped_via_alias--;
> 
>assert(alias->mapped_via_alias >= 0);

Makes sense, I'll respin with that -- thanks!

-- 
Thanks,

David / dhildenb

Re: [PATCH v8 5/8] qmp: decode feature & status bits in virtio-status

2021-10-27 Thread Laurent Vivier


On 27/10/2021 13:59, David Hildenbrand wrote:

On 27.10.21 13:41, Jonah Palmer wrote:

From: Laurent Vivier 

Display feature names instead of bitmaps for host, guest, and
backend for VirtIODevice.

Display status names instead of bitmaps for VirtIODevice.

Display feature names instead of bitmaps for backend, protocol,
acked, and features (hdev->features) for vhost devices.

Decode features according to device type. Decode status
according to configuration status bitmap (config_status_map).
Decode vhost user protocol features according to vhost user
protocol bitmap (vhost_user_protocol_map).

Transport features are on the first line. Undecoded bits
(if any) are stored in a separate field. Vhost device field
wont show if there's no vhost active for a given VirtIODevice.

Signed-off-by: Jonah Palmer 
---
  hw/block/virtio-blk.c  |  28 ++
  hw/char/virtio-serial-bus.c|  11 +
  hw/display/virtio-gpu-base.c   |  18 +-
  hw/input/virtio-input.c|  11 +-
  hw/net/virtio-net.c|  47 
  hw/scsi/virtio-scsi.c  |  17 ++
  hw/virtio/vhost-user-fs.c  |  10 +
  hw/virtio/vhost-vsock-common.c |  10 +
  hw/virtio/virtio-balloon.c |  14 +
  hw/virtio/virtio-crypto.c  |  10 +
  hw/virtio/virtio-iommu.c   |  14 +
  hw/virtio/virtio.c | 273 ++-
  include/hw/virtio/vhost.h  |   3 +
  include/hw/virtio/virtio.h |  17 ++
  qapi/virtio.json   | 577 ++---


Any particular reason we're not handling virtio-mem?

(there is only a single feature bit so far, a second one to be
introduced soon)



I think this is because the v1 of the series has been written in March 2020 and it has not 
been update when virtio-mem has been added (June 2020).


Thanks,
Laurent

Re: [PATCH 16/33] target/ppc: Implement Vector Insert Word from GPR using Immediate insns

2021-10-27 Thread Matheus K. Ferst


On 26/10/2021 15:45, Paul A. Clarke wrote:

On Tue, Oct 26, 2021 at 09:58:15AM -0700, Richard Henderson wrote:

On 10/26/21 7:33 AM, Matheus K. Ferst wrote:

It says that "if UIM is greater than N, the result is undefined." My
first read was also that the outcome is "boundedly undefined," but I
guess it can be understood as "the resulting value in VRT will be
undefined" (like when the pseudo-code uses "VRT <- 0x_..._"), in
which case this patch and Mambo are correct.


If the reference simulator is fine with it, I am too.


FYI, it appears that the hardware does a partial insert, per an experiment:
```
1: x/i $pc
=> 0x16d4 :  vinsw   v2,r3,14
(gdb) p $v2.v4_int32
$1 = {0x1, 0x1, 0x1, 0x1}
(gdb) p $r3
$2 = 0x12345678
(gdb) nexti
(gdb) p $v2.v4_int32
$3 = {0x1234, 0x1, 0x1, 0x1}



Thanks for this test Paul. I'll add a comment about the hardware behavior.

--
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer

Re: [PATCH V6 00/27] Live Update

2021-10-27 Thread Steven Sistare

Soon.  I'll aim for next week.  Thanks for your continued interest!

- Steve

On 10/27/2021 2:16 AM, Zheng Chuan wrote:
> Hi, Steve.
> Any updates for this series?

Re: [PATCH 0/1] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Gerd Hoffmann

  Hi,

> Notably the latter is more restrictive that git branch names. We could
> assume users always have "sensible" branch names that are less than
> 128 chars and only alpha-num characters plus dash/underscore. This
> would be fine for my personal branch naming, but I wonder if anyone
> uses wierd branch names that would violate docker tag name rules ?

/me uses slashes in branch names, i.e.

queue/$topic for patch queues
$hostname/$subject for my development branches.

take care,
  Gerd

Re: [PATCH 0/2] Fix mtfsf, mtfsfi and mtfsb1 bug

2021-10-27 Thread Matheus K. Ferst


CC'ing the reporter

On 20/10/2021 09:57, Lucas Mateus Castro (alqotel) wrote:

From: "Lucas Mateus Castro (alqotel)" 

The instructions mtfsf, mtfsfi and mtfsb, when called, fail to set the FI
bit (bit 46 in the FPSCR) and can set to 1 the reserved bit 52 of the
FPSCR, as reported in https://gitlab.com/qemu-project/qemu/-/issues/266
(although the bug report is only for mtfsf, the bug applies to mtfsfi and
mtfsb1 as well).

These instructions also fail to throw an exception when the exception
and enabling bits are set, this can be tested by adding
'prctl(PR_SET_FPEXC, PR_FP_EXC_PRECISE);' before the __builtin_mtfsf
call in the test case of the bug report.

These patches aim to fix these issues.

Lucas Mateus Castro (alqotel) (2):
   target/ppc: Fixed call to deferred exception
   target/ppc: ppc_store_fpscr doesn't update bit 52

  target/ppc/cpu.c   |  2 +-
  target/ppc/cpu.h   |  3 +++
  target/ppc/fpu_helper.c| 41 ++
  target/ppc/helper.h|  1 +
  target/ppc/translate/fp-impl.c.inc |  6 ++---
  5 files changed, 49 insertions(+), 4 deletions(-)

--
2.31.1



--
Matheus K. Ferst
Instituto de Pesquisas ELDORADO 
Analista de Software
Aviso Legal - Disclaimer

Re: [RFC 0/2] tls: add macros for coroutine-safe TLS variables

2021-10-27 Thread Stefan Hajnoczi

On Tue, Oct 26, 2021 at 10:10:44AM -0700, Richard Henderson wrote:
> On 10/26/21 9:34 AM, Stefan Hajnoczi wrote:
> > On Tue, Oct 26, 2021 at 08:10:16AM -0700, Richard Henderson wrote:
> > > On 10/26/21 6:22 AM, Stefan Hajnoczi wrote:
> > > > If "safe" TLS variables are opt-in then we'll likely have obscure bugs
> > > > when code changes to access a TLS variable that was previously never
> > > > accessed from a coroutine. There is no compiler error and no way to
> > > > detect this. When it happens debugging it is painful.
> > > 
> > > Co-routines are never used in user-only builds.
> > 
> > If developers have the choice of using __thread then bugs can slip
> > through.
> 
> Huh?  How.  No, really.

If there is no checkpatch.pl error then more instances of __thread will
slip in. Not everyone in the QEMU community will be aware of this issue,
so it's likely that code with __thread will get merged.

Subsystems that use coroutines today include block, 9p, mpqemu, io
channels, migration, colo, and monitor commands.

I understand that qemu-user is particularly unlikely to use coroutines.
Thomas' suggestion sounds good to me. Let's allow __thread only in
subsystems where it's safe.

Stefan

signature.asc
Description: PGP signature

Re: [PATCH 0/1] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Daniel P . Berrangé

On Wed, Oct 27, 2021 at 02:26:58PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > Notably the latter is more restrictive that git branch names. We could
> > assume users always have "sensible" branch names that are less than
> > 128 chars and only alpha-num characters plus dash/underscore. This
> > would be fine for my personal branch naming, but I wonder if anyone
> > uses wierd branch names that would violate docker tag name rules ?
> 
> /me uses slashes in branch names, i.e.
> 
> queue/$topic for patch queues
> $hostname/$subject for my development branches.

Ok, good to know. So that example clearly rules out use of git branch
names as docker tags.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

[PATCH v1 06/12] memory-device: Generalize memory_device_used_region_size()

2021-10-27 Thread David Hildenbrand

Let's generalize traversal of all plugged memory devices to collect
information in the context of memory_device_check_addable() to prepare for
future changes.

Signed-off-by: David Hildenbrand 
---
 hw/mem/memory-device.c | 22 +-
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 68a2c3dbcc..a915894819 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -50,20 +50,24 @@ static int memory_device_build_list(Object *obj, void 
*opaque)
 return 0;
 }
 
-static int memory_device_used_region_size(Object *obj, void *opaque)
+struct memory_devices_info {
+uint64_t region_size;
+};
+
+static int memory_devices_collect_info(Object *obj, void *opaque)
 {
-uint64_t *size = opaque;
+struct memory_devices_info *i = opaque;
 
 if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) {
 const DeviceState *dev = DEVICE(obj);
 const MemoryDeviceState *md = MEMORY_DEVICE(obj);
 
 if (dev->realized) {
-*size += memory_device_get_region_size(md, &error_abort);
+i->region_size += memory_device_get_region_size(md, &error_abort);
 }
 }
 
-object_child_foreach(obj, memory_device_used_region_size, opaque);
+object_child_foreach(obj, memory_devices_collect_info, opaque);
 return 0;
 }
 
@@ -71,7 +75,7 @@ static void memory_device_check_addable(MachineState *ms, 
MemoryRegion *mr,
 Error **errp)
 {
 const uint64_t size = memory_region_size(mr);
-uint64_t used_region_size = 0;
+struct memory_devices_info info = {};
 
 /* we will need a new memory slot for kvm and vhost */
 if (kvm_enabled() && !kvm_get_free_memslots()) {
@@ -84,12 +88,12 @@ static void memory_device_check_addable(MachineState *ms, 
MemoryRegion *mr,
 }
 
 /* will we exceed the total amount of memory specified */
-memory_device_used_region_size(OBJECT(ms), &used_region_size);
-if (used_region_size + size < used_region_size ||
-used_region_size + size > ms->maxram_size - ms->ram_size) {
+memory_devices_collect_info(OBJECT(ms), &info);
+if (info.region_size + size < info.region_size ||
+info.region_size + size > ms->maxram_size - ms->ram_size) {
 error_setg(errp, "not enough space, currently 0x%" PRIx64
" in use of total space for memory devices 0x" RAM_ADDR_FMT,
-   used_region_size, ms->maxram_size - ms->ram_size);
+   info.region_size, ms->maxram_size - ms->ram_size);
 return;
 }
 
-- 
2.31.1

[PATCH v1 11/12] virtio-mem: Set the RamDiscardManager for the RAM memory region earlier

2021-10-27 Thread David Hildenbrand

Let's set the RamDiscardManager earlier, logically before we expose the
RAM memory region to the system. This is a preparation for further changes
and is logically cleaner: before we expose the RAM memory region to
migration code, make sure we have the RamDiscardManager setup.

Signed-off-by: David Hildenbrand 
---
 hw/virtio/virtio-mem.c | 19 ---
 1 file changed, 8 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 0f5eae4a7e..1e29706798 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -773,16 +773,17 @@ static void virtio_mem_device_realize(DeviceState *dev, 
Error **errp)
 sizeof(struct virtio_mem_config));
 vmem->vq = virtio_add_queue(vdev, 128, virtio_mem_handle_request);
 
-host_memory_backend_set_mapped(vmem->memdev, true);
-vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
-qemu_register_reset(virtio_mem_system_reset, vmem);
-
 /*
- * Set ourselves as RamDiscardManager before the plug handler maps the
- * memory region and exposes it via an address space.
+ * Set ourselves as RamDiscardManager before we expose the memory region
+ * to the system (e.g., marking the RAMBlock migratable, mapping the
+ * region).
  */
 memory_region_set_ram_discard_manager(&vmem->memdev->mr,
   RAM_DISCARD_MANAGER(vmem));
+
+host_memory_backend_set_mapped(vmem->memdev, true);
+vmstate_register_ram(&vmem->memdev->mr, DEVICE(vmem));
+qemu_register_reset(virtio_mem_system_reset, vmem);
 }
 
 static void virtio_mem_device_unrealize(DeviceState *dev)
@@ -790,14 +791,10 @@ static void virtio_mem_device_unrealize(DeviceState *dev)
 VirtIODevice *vdev = VIRTIO_DEVICE(dev);
 VirtIOMEM *vmem = VIRTIO_MEM(dev);
 
-/*
- * The unplug handler unmapped the memory region, it cannot be
- * found via an address space anymore. Unset ourselves.
- */
-memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
 qemu_unregister_reset(virtio_mem_system_reset, vmem);
 vmstate_unregister_ram(&vmem->memdev->mr, DEVICE(vmem));
 host_memory_backend_set_mapped(vmem->memdev, false);
+memory_region_set_ram_discard_manager(&vmem->memdev->mr, NULL);
 virtio_del_queue(vdev, 0);
 virtio_cleanup(vdev);
 g_free(vmem->bitmap);
-- 
2.31.1

[PATCH v1 05/12] memory-device: Move memory_device_check_addable() directly into memory_device_pre_plug()

2021-10-27 Thread David Hildenbrand

Move it out of memory_device_get_free_addr(), which is cleaner and
prepares for future changes.

Signed-off-by: David Hildenbrand 
---
 hw/mem/memory-device.c | 15 +++
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 7f76a09e57..68a2c3dbcc 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -67,9 +67,10 @@ static int memory_device_used_region_size(Object *obj, void 
*opaque)
 return 0;
 }
 
-static void memory_device_check_addable(MachineState *ms, uint64_t size,
+static void memory_device_check_addable(MachineState *ms, MemoryRegion *mr,
 Error **errp)
 {
+const uint64_t size = memory_region_size(mr);
 uint64_t used_region_size = 0;
 
 /* we will need a new memory slot for kvm and vhost */
@@ -99,7 +100,6 @@ static uint64_t memory_device_get_free_addr(MachineState *ms,
 uint64_t align, uint64_t size,
 Error **errp)
 {
-Error *err = NULL;
 GSList *list = NULL, *item;
 Range as, new = range_empty;
 
@@ -125,12 +125,6 @@ static uint64_t memory_device_get_free_addr(MachineState 
*ms,
 align);
 }
 
-memory_device_check_addable(ms, size, &err);
-if (err) {
-error_propagate(errp, err);
-return 0;
-}
-
 if (hint && !QEMU_IS_ALIGNED(*hint, align)) {
 error_setg(errp, "address must be aligned to 0x%" PRIx64 " bytes",
align);
@@ -259,6 +253,11 @@ void memory_device_pre_plug(MemoryDeviceState *md, 
MachineState *ms,
 goto out;
 }
 
+memory_device_check_addable(ms, mr, &local_err);
+if (local_err) {
+goto out;
+}
+
 if (legacy_align) {
 align = *legacy_align;
 } else {
-- 
2.31.1

Re: MMIO/PIO dispatch file descriptors (ioregionfd) design discussion

2021-10-27 Thread John Levon

On Wed, Oct 27, 2021 at 11:15:35AM +0100, Stefan Hajnoczi wrote:

> > > I like this approach as well.
> > > As you have mentioned, the device emulation code with first approach
> > > does have to how to handle the region accesses. The second approach will
> > > make things more transparent. Let me see how can I modify what there is
> > > there now and may ask further questions.
> > 
> > Sorry I'm a bit late to this discussion, I'm not clear on the above WRT
> > vfio-user. If an ioregionfd has to cover a whole BAR0 (?), how would this
> > interact with partly-mmap()able regions like we do with SPDK/vfio-user/NVMe?
> 
> The ioregionfd doesn't need to cover an entire BAR. QEMU's MemoryRegions
> form a hierarchy, so it's possible to sub-divide the BAR into several
> MemoryRegions.

I think you're saying that when vfio-user client in qemu calls
VFIO_USER_DEVICE_GET_REGION_IO_FDS, it would create a sub-MR corresponding to
each one, before asking KVM to configure them?

thanks
john

[PATCH v1 07/12] memory-device: Support memory devices that dynamically consume multiple memslots

2021-10-27 Thread David Hildenbrand

We want to support memory devices that have a container as device memory
region, and (dynamically) map individual chunks into that container
resulting in multiple memslots getting consumed by such a device. So we
want to support memory devices that require a) multiple memslots and
b) dynamically make use of these memslots.

We already have one device that uses a container as device memory region:
NVDIMM. However, an NVDIMM also end up consuming exactly one memslot.

The target use case will be virtio-mem, which will dynamically map
parts of a source RAM memory region into the container device region
using aliases, consuming one memslot per alias.

We need a way to query from a memory device:
* The currently used number memslots.
* The total number of memslots that might get used across device
  lifetime.

Expose one helper functions -- memory_devices_get_reserved_memslots() --
that will be used by vhost code to respect the current memslot reservation
when realizing vhost device.

Limit the number of memslots usable by memory devices to something sane
-- for now 2048 -- we really don't want to create tens of thousands of
memslots that can degrade performance when traversing an address space,
just because it would be possible (for example, without KVM and without
vhost there is no real limit ...). This does not affect existing setups
(for example more than 256 DIMMs+NVDIMMs is not supported).

Signed-off-by: David Hildenbrand 
---
 hw/mem/memory-device.c | 84 ++
 include/hw/mem/memory-device.h | 33 +
 stubs/qmp_memory_device.c  |  5 ++
 3 files changed, 113 insertions(+), 9 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index a915894819..bb02eb410a 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -18,6 +18,8 @@
 #include "sysemu/kvm.h"
 #include "trace.h"
 
+#define MEMORY_DEVICES_MAX_MEMSLOTS 2048
+
 static gint memory_device_addr_sort(gconstpointer a, gconstpointer b)
 {
 const MemoryDeviceState *md_a = MEMORY_DEVICE(a);
@@ -50,8 +52,28 @@ static int memory_device_build_list(Object *obj, void 
*opaque)
 return 0;
 }
 
+static unsigned int memory_device_get_used_memslots(const MemoryDeviceState 
*md)
+{
+const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+
+if (!mdc->get_used_memslots)
+return 1;
+return mdc->get_used_memslots(md, &error_abort);
+}
+
+static unsigned int memory_device_get_memslots(const MemoryDeviceState *md)
+{
+const MemoryDeviceClass *mdc = MEMORY_DEVICE_GET_CLASS(md);
+
+if (!mdc->get_memslots)
+return 1;
+return mdc->get_memslots(md, &error_abort);
+}
+
 struct memory_devices_info {
 uint64_t region_size;
+unsigned int used_memslots;
+unsigned int reserved_memslots;
 };
 
 static int memory_devices_collect_info(Object *obj, void *opaque)
@@ -61,9 +83,15 @@ static int memory_devices_collect_info(Object *obj, void 
*opaque)
 if (object_dynamic_cast(obj, TYPE_MEMORY_DEVICE)) {
 const DeviceState *dev = DEVICE(obj);
 const MemoryDeviceState *md = MEMORY_DEVICE(obj);
+unsigned int used, total;
 
 if (dev->realized) {
 i->region_size += memory_device_get_region_size(md, &error_abort);
+
+used = memory_device_get_used_memslots(md);
+total = memory_device_get_memslots(md);
+i->used_memslots += used;
+i->reserved_memslots += total - used;
 }
 }
 
@@ -71,24 +99,62 @@ static int memory_devices_collect_info(Object *obj, void 
*opaque)
 return 0;
 }
 
-static void memory_device_check_addable(MachineState *ms, MemoryRegion *mr,
-Error **errp)
+/*
+ * Get the number of memslots that are reserved (not used yet but will get used
+ * dynamically in the future without further checks) by all memory devices.
+ */
+unsigned int memory_devices_get_reserved_memslots(void)
+{
+struct memory_devices_info info = {};
+
+memory_devices_collect_info(qdev_get_machine(), &info);
+return info.reserved_memslots;
+}
+
+static void memory_device_check_addable(MachineState *ms, MemoryDeviceState 
*md,
+MemoryRegion *mr, Error **errp)
 {
 const uint64_t size = memory_region_size(mr);
 struct memory_devices_info info = {};
+unsigned int required, reserved;
+
+memory_devices_collect_info(OBJECT(ms), &info);
+reserved = info.reserved_memslots;
+required = memory_device_get_memslots(md);
 
-/* we will need a new memory slot for kvm and vhost */
-if (kvm_enabled() && !kvm_get_free_memslots()) {
-error_setg(errp, "hypervisor has no free memory slots left");
+/*
+ * Limit the maximum number of memslot used by memory devices to something
+ * sane.
+ */
+if (info.used_memslots + reserved + required >
+MEMORY_DEVICES_MAX_MEMSLOTS) {
+error_setg(errp, "The maximum number of memory slots

[PATCH v1 08/12] vhost: Respect reserved memslots for memory devices when realizing a vhost device

2021-10-27 Thread David Hildenbrand

Make sure that the current reservations can be fulfilled, otherwise we
might run out of memslots later when memory devices start actually using
the reserved memslots and crash.

Signed-off-by: David Hildenbrand 
---
 hw/virtio/vhost.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 49a1074097..a8f0d0bdf7 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -23,6 +23,7 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
+#include "hw/mem/memory-device.h"
 #include "migration/blocker.h"
 #include "migration/qemu-file-types.h"
 #include "sysemu/dma.h"
@@ -1319,7 +1320,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
Error **errp)
 {
 uint64_t features;
-int i, r, n_initialized_vqs = 0;
+int i, r, reserved_memslots, backend_memslots, n_initialized_vqs = 0;
 
 hdev->vdev = NULL;
 hdev->migration_blocker = NULL;
@@ -1415,9 +1416,13 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 memory_listener_register(&hdev->memory_listener, &address_space_memory);
 QLIST_INSERT_HEAD(&vhost_devices, hdev, entry);
 
-if (used_memslots > hdev->vhost_ops->vhost_backend_memslots_limit(hdev)) {
-error_setg(errp, "vhost backend memory slots limit is less"
-   " than current number of present memory slots");
+reserved_memslots = memory_devices_get_reserved_memslots();
+backend_memslots = hdev->vhost_ops->vhost_backend_memslots_limit(hdev);
+if (used_memslots + reserved_memslots > backend_memslots) {
+error_setg(errp, "vhost backend memory slots limit (%d) is less"
+   " than current number of used (%d) and reserved (%d)"
+   " memory slots", backend_memslots, used_memslots,
+   reserved_memslots);
 r = -EINVAL;
 goto fail_busyloop;
 }
-- 
2.31.1

[PATCH v1 12/12] virtio-mem: Expose device memory via multiple memslots

2021-10-27 Thread David Hildenbrand

We want to expose virtio-mem device memory via multiple memslots to the
guest on demand, essentially reducing the total size of KVM slots
significantly (and thereby metadata in KVM and in QEMU for KVM memory
slots) especially when exposing initially only a small amount of memory via
a virtio-mem device to the guest, to hotplug more memory later. Further,
not always exposing the full device memory region to the guest reduces the
attack surface in many setups without requiring other mechanisms like
userfaultfd for protection of unplugged memory.

So split the original RAM region via memory region aliases into separate
memory slots, and dynamically map the required memory slots into the
container.

For now, we always map the memslots covered by the usable region. In the
future, with VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to map
memslots on actual demand and optimize further.

The user is in charge of setting the number of memslots the device
should use. In the future, we might add an auto mode, specified via
"memslots=0" for user convenience.

There are two new properties:

1) "used-memslots" contains how many memslots are currently used
 and is read-only. Used internally, but can also be used for debugging/
 introspection purposes.

2) "memslots" specifies how many memslots the device is should use.
 * "1" is the default and corresponds mostly to the old behavior. The
   only exception is that with a usable region size of 0, the single
   memslot won't get used and nothing will get mapped.
 * "> 1" tells the device to use the given number of memslots. There are
   a couple of restrictions:
   * Cannot be bigger than 1024 or equal to 0.
   * Cannot be bigger than the number of device blocks.
   * Must not result in memslots that are smaller than the minimum
 memslot size
 This parameter doesn't have to be migrated and can differ between
 source and destination.

The minimum memslot size (and thereby the alignment of memslots in
guest physical address space) is currently defined to be 128 MiB --
a memory slot size we know works (due to x86-64 DIMMs) without confusing
devices that might not be able to handle crossing memory regions in
address spaces when performing I/O.

Signed-off-by: David Hildenbrand 
---
 hw/virtio/virtio-mem-pci.c |  23 +
 hw/virtio/virtio-mem.c | 183 -
 include/hw/virtio/virtio-mem.h |  25 -
 3 files changed, 226 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/virtio-mem-pci.c b/hw/virtio/virtio-mem-pci.c
index be2383b0c5..1dc4078941 100644
--- a/hw/virtio/virtio-mem-pci.c
+++ b/hw/virtio/virtio-mem-pci.c
@@ -82,6 +82,21 @@ static uint64_t virtio_mem_pci_get_min_alignment(const 
MemoryDeviceState *md)
 &error_abort);
 }
 
+static unsigned int virtio_mem_pci_get_used_memslots(
+const MemoryDeviceState 
*md,
+ Error **errp)
+{
+return object_property_get_uint(OBJECT(md), VIRTIO_MEM_USED_MEMSLOTS_PROP,
+&error_abort);
+}
+
+static unsigned int virtio_mem_pci_get_memslots(const MemoryDeviceState *md,
+Error **errp)
+{
+return object_property_get_uint(OBJECT(md), VIRTIO_MEM_MEMSLOTS_PROP,
+&error_abort);
+}
+
 static void virtio_mem_pci_size_change_notify(Notifier *notifier, void *data)
 {
 VirtIOMEMPCI *pci_mem = container_of(notifier, VirtIOMEMPCI,
@@ -115,6 +130,8 @@ static void virtio_mem_pci_class_init(ObjectClass *klass, 
void *data)
 mdc->get_memory_region = virtio_mem_pci_get_memory_region;
 mdc->fill_device_info = virtio_mem_pci_fill_device_info;
 mdc->get_min_alignment = virtio_mem_pci_get_min_alignment;
+mdc->get_used_memslots = virtio_mem_pci_get_used_memslots;
+mdc->get_memslots = virtio_mem_pci_get_memslots;
 }
 
 static void virtio_mem_pci_instance_init(Object *obj)
@@ -142,6 +159,12 @@ static void virtio_mem_pci_instance_init(Object *obj)
 object_property_add_alias(obj, VIRTIO_MEM_REQUESTED_SIZE_PROP,
   OBJECT(&dev->vdev),
   VIRTIO_MEM_REQUESTED_SIZE_PROP);
+object_property_add_alias(obj, VIRTIO_MEM_MEMSLOTS_PROP,
+  OBJECT(&dev->vdev),
+  VIRTIO_MEM_MEMSLOTS_PROP);
+object_property_add_alias(obj, VIRTIO_MEM_USED_MEMSLOTS_PROP,
+  OBJECT(&dev->vdev),
+  VIRTIO_MEM_USED_MEMSLOTS_PROP);
 }
 
 static const VirtioPCIDeviceTypeInfo virtio_mem_pci_info = {
diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index 1e29706798..f0ad365b91 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -23,6 +23,7 @@
 #include "hw/virtio/virtio-bus.h"
 #include "hw/virtio/virtio-access.h"
 #include "hw/virtio/virtio-mem.h"
+#include "hw/

[PATCH v1 00/12] virtio-mem: Expose device memory via multiple memslots

2021-10-27 Thread David Hildenbrand

This is the follow-up of [1], dropping auto-detection and vhost-user
changes from the initial RFC.

Based-on: 20211011175346.15499-1-da...@redhat.com

A virtio-mem device is represented by a single large RAM memory region
backed by a single large mmap.

Right now, we map that complete memory region into guest physical addres
space, resulting in a very large memory mapping, KVM memory slot, ...
although only a small amount of memory might actually be exposed to the VM.

For example, when starting a VM with a 1 TiB virtio-mem device that only
exposes little device memory (e.g., 1 GiB) towards the VM initialliy,
in order to hotplug more memory later, we waste a lot of memory on metadata
for KVM memory slots (> 2 GiB!) and accompanied bitmaps. Although some
optimizations in KVM are being worked on to reduce this metadata overhead
on x86-64 in some cases, it remains a problem with nested VMs and there are
other reasons why we would want to reduce the total memory slot to a
reasonable minimum.

We want to:
a) Reduce the metadata overhead, including bitmap sizes inside KVM but also
   inside QEMU KVM code where possible.
b) Not always expose all device-memory to the VM, to reduce the attack
   surface of malicious VMs without using userfaultfd.

So instead, expose the RAM memory region not by a single large mapping
(consuming one memslot) but instead by multiple mappings, each consuming
one memslot. To do that, we divide the RAM memory region via aliases into
separate parts and only map the aliases into a device container we actually
need. We have to make sure that QEMU won't silently merge the memory
sections corresponding to the aliases (and thereby also memslots),
otherwise we lose atomic updates with KVM and vhost-user, which we deeply
care about when adding/removing memory. Further, to get memslot accounting
right, such merging is better avoided.

Within the memslots, virtio-mem can (un)plug memory in smaller granularity
dynamically. So memslots are a pure optimization to tackle a) and b) above.

The user configures how many memslots a virtio-mem device should use, the
default is "1" -- essentially corresponding to the old behavior.

Memslots are right now mapped once they fall into the usable device region
(which grows/shrinks on demand right now either when requesting to
 hotplug more memory or during/after reboots). In the future, with
VIRTIO_MEM_F_UNPLUGGED_INACCESSIBLE, we'll be able to (un)map aliases even
more dynamically when (un)plugging device blocks.


Adding a 500GiB virtio-mem device with "memslots=500" and not hotplugging
any memory results in:
00014000-01047fff (prio 0, i/o): device-memory
  00014000-007e3fff (prio 0, i/o): virtio-mem-memslots

Requesting the VM to consume 2 GiB results in (note: the usable region size
is bigger than 2 GiB, so 3 * 1 GiB memslots are required):
00014000-01047fff (prio 0, i/o): device-memory
  00014000-007e3fff (prio 0, i/o): virtio-mem-memslots
00014000-00017fff (prio 0, ram): alias 
virtio-mem-memslot-0 @mem0 -3fff
00018000-0001bfff (prio 0, ram): alias 
virtio-mem-memslot-1 @mem0 4000-7fff
0001c000-0001 (prio 0, ram): alias 
virtio-mem-memslot-2 @mem0 8000-bfff

Requesting the VM to consume 20 GiB results in:
00014000-01047fff (prio 0, i/o): device-memory
  00014000-007e3fff (prio 0, i/o): virtio-mem-memslots
00014000-00017fff (prio 0, ram): alias 
virtio-mem-memslot-0 @mem0 -3fff
00018000-0001bfff (prio 0, ram): alias 
virtio-mem-memslot-1 @mem0 4000-7fff
0001c000-0001 (prio 0, ram): alias 
virtio-mem-memslot-2 @mem0 8000-bfff
0002-00023fff (prio 0, ram): alias 
virtio-mem-memslot-3 @mem0 c000-
00024000-00027fff (prio 0, ram): alias 
virtio-mem-memslot-4 @mem0 0001-00013fff
00028000-0002bfff (prio 0, ram): alias 
virtio-mem-memslot-5 @mem0 00014000-00017fff
0002c000-0002 (prio 0, ram): alias 
virtio-mem-memslot-6 @mem0 00018000-0001bfff
0003-00033fff (prio 0, ram): alias 
virtio-mem-memslot-7 @mem0 0001c000-0001
00034000-00037fff (prio 0, ram): alias 
virtio-mem-memslot-8 @mem0 0002-00023fff
00038000-0003bfff (prio 0, ram): alias 
virtio-mem-memslot-9 @mem0 00024000-00027fff
0003c000-0003 (prio 0, ram): alias 
virtio-mem-memslot-10 @mem0 00028000-0002bfff
0004-00043

[PATCH v1 09/12] memory: Drop mapping check from memory_region_get_ram_discard_manager()

2021-10-27 Thread David Hildenbrand

It's sufficient to check whether a memory region is RAM, the region
doesn't necessarily have to be mapped into another memory region.
For example, RAM memory regions mapped via an alias will never make
"memory_region_is_mapped()" succeed.

Signed-off-by: David Hildenbrand 
---
 softmmu/memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/softmmu/memory.c b/softmmu/memory.c
index d8a584ca80..2496bff729 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2045,7 +2045,7 @@ int memory_region_iommu_num_indexes(IOMMUMemoryRegion 
*iommu_mr)
 
 RamDiscardManager *memory_region_get_ram_discard_manager(MemoryRegion *mr)
 {
-if (!memory_region_is_mapped(mr) || !memory_region_is_ram(mr)) {
+if (!memory_region_is_ram(mr)) {
 return NULL;
 }
 return mr->rdm;
-- 
2.31.1

[PATCH v1 01/12] kvm: Return number of free memslots

2021-10-27 Thread David Hildenbrand

Let's return the number of free slots instead of only checking if there
is a free slot. Required to support memory devices that consume multiple
memslots.

Signed-off-by: David Hildenbrand 
---
 accel/kvm/kvm-all.c| 24 +++-
 accel/stubs/kvm-stub.c |  4 ++--
 hw/mem/memory-device.c |  2 +-
 include/sysemu/kvm.h   |  2 +-
 4 files changed, 15 insertions(+), 17 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index db8d83b137..0846be835e 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -103,6 +103,7 @@ struct KVMState
 AccelState parent_obj;
 
 int nr_slots;
+int nr_free_slots;
 int fd;
 int vmfd;
 int coalesced_mmio;
@@ -245,6 +246,13 @@ int kvm_get_max_memslots(void)
 return s->nr_slots;
 }
 
+unsigned int kvm_get_free_memslots(void)
+{
+KVMState *s = kvm_state;
+
+return s->nr_free_slots;
+}
+
 /* Called with KVMMemoryListener.slots_lock held */
 static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 {
@@ -260,19 +268,6 @@ static KVMSlot *kvm_get_free_slot(KVMMemoryListener *kml)
 return NULL;
 }
 
-bool kvm_has_free_slot(MachineState *ms)
-{
-KVMState *s = KVM_STATE(ms->accelerator);
-bool result;
-KVMMemoryListener *kml = &s->memory_listener;
-
-kvm_slots_lock();
-result = !!kvm_get_free_slot(kml);
-kvm_slots_unlock();
-
-return result;
-}
-
 /* Called with KVMMemoryListener.slots_lock held */
 static KVMSlot *kvm_alloc_slot(KVMMemoryListener *kml)
 {
@@ -1410,6 +1405,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 }
 start_addr += slot_size;
 size -= slot_size;
+kvm_state->nr_free_slots++;
 } while (size);
 goto out;
 }
@@ -1435,6 +1431,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
 ram_start_offset += slot_size;
 ram += slot_size;
 size -= slot_size;
+kvm_state->nr_free_slots--;
 } while (size);
 
 out:
@@ -2364,6 +2361,7 @@ static int kvm_init(MachineState *ms)
 if (!s->nr_slots) {
 s->nr_slots = 32;
 }
+s->nr_free_slots = s->nr_slots;
 
 s->nr_as = kvm_check_extension(s, KVM_CAP_MULTI_ADDRESS_SPACE);
 if (s->nr_as <= 1) {
diff --git a/accel/stubs/kvm-stub.c b/accel/stubs/kvm-stub.c
index 5b1d00a222..cbaeb7c656 100644
--- a/accel/stubs/kvm-stub.c
+++ b/accel/stubs/kvm-stub.c
@@ -133,9 +133,9 @@ int kvm_irqchip_remove_irqfd_notifier_gsi(KVMState *s, 
EventNotifier *n,
 return -ENOSYS;
 }
 
-bool kvm_has_free_slot(MachineState *ms)
+unsigned int kvm_get_free_memslots(void)
 {
-return false;
+return 0;
 }
 
 void kvm_init_cpu_signals(CPUState *cpu)
diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index d9f8301711..9045ead33e 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -73,7 +73,7 @@ static void memory_device_check_addable(MachineState *ms, 
uint64_t size,
 uint64_t used_region_size = 0;
 
 /* we will need a new memory slot for kvm and vhost */
-if (kvm_enabled() && !kvm_has_free_slot(ms)) {
+if (kvm_enabled() && !kvm_get_free_memslots()) {
 error_setg(errp, "hypervisor has no free memory slots left");
 return;
 }
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index a1ab1ee12d..c18be3cbd5 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -211,7 +211,7 @@ typedef struct Notifier Notifier;
 
 /* external API */
 
-bool kvm_has_free_slot(MachineState *ms);
+unsigned int kvm_get_free_memslots(void);
 bool kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
 int kvm_has_robust_singlestep(void);
-- 
2.31.1

Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-27 Thread Tom Rini

On Fri, Oct 15, 2021 at 12:03:44PM -0600, Simon Glass wrote:
> Hi all,
> 
> On Thu, 14 Oct 2021 at 09:28, Tom Rini  wrote:
> >
> > On Thu, Oct 14, 2021 at 09:17:52AM -0600, Simon Glass wrote:
> > > Hi Tom,
> > >
> > > On Thu, 14 Oct 2021 at 08:56, Tom Rini  wrote:
> > > >
> > > > On Wed, Oct 13, 2021 at 12:06:02PM -0600, Simon Glass wrote:
> > > > > Hi François,
> > > > >
> > > > > On Wed, 13 Oct 2021 at 11:35, François Ozog 
> > > > >  wrote:
> > > > > >
> > > > > > Hi Simon
> > > > > >
> > > > > > Le mer. 13 oct. 2021 à 16:49, Simon Glass  a 
> > > > > > écrit :
> > > > > >>
> > > > > >> Hi Tom, Bin,François,
> > > > > >>
> > > > > >> On Tue, 12 Oct 2021 at 19:34, Tom Rini  wrote:
> > > > > >> >
> > > > > >> > On Wed, Oct 13, 2021 at 09:29:14AM +0800, Bin Meng wrote:
> > > > > >> > > Hi Simon,
> > > > > >> > >
> > > > > >> > > On Wed, Oct 13, 2021 at 9:01 AM Simon Glass 
> > > > > >> > >  wrote:
> > > > > >> > > >
> > > > > >> > > > With Ilias' efforts we have dropped OF_PRIOR_STAGE and 
> > > > > >> > > > OF_HOSTFILE so
> > > > > >> > > > there are only three ways to obtain a devicetree:
> > > > > >> > > >
> > > > > >> > > >- OF_SEPARATE - the normal way, where the devicetree is 
> > > > > >> > > > built and
> > > > > >> > > >   appended to U-Boot
> > > > > >> > > >- OF_EMBED - for development purposes, the devicetree is 
> > > > > >> > > > embedded in
> > > > > >> > > >   the ELF file (also used for EFI)
> > > > > >> > > >- OF_BOARD - the board figures it out on its own
> > > > > >> > > >
> > > > > >> > > > The last one is currently set up so that no devicetree is 
> > > > > >> > > > needed at all
> > > > > >> > > > in the U-Boot tree. Most boards do provide one, but some 
> > > > > >> > > > don't. Some
> > > > > >> > > > don't even provide instructions on how to boot on the board.
> > > > > >> > > >
> > > > > >> > > > The problems with this approach are documented at [1].
> > > > > >> > > >
> > > > > >> > > > In practice, OF_BOARD is not really distinct from 
> > > > > >> > > > OF_SEPARATE. Any board
> > > > > >> > > > can obtain its devicetree at runtime, even it is has a 
> > > > > >> > > > devicetree built
> > > > > >> > > > in U-Boot. This is because U-Boot may be a second-stage 
> > > > > >> > > > bootloader and its
> > > > > >> > > > caller may have a better idea about the hardware available 
> > > > > >> > > > in the machine.
> > > > > >> > > > This is the case with a few QEMU boards, for example.
> > > > > >> > > >
> > > > > >> > > > So it makes no sense to have OF_BOARD as a 'choice'. It 
> > > > > >> > > > should be an
> > > > > >> > > > option, available with either OF_SEPARATE or OF_EMBED.
> > > > > >> > > >
> > > > > >> > > > This series makes this change, adding various missing 
> > > > > >> > > > devicetree files
> > > > > >> > > > (and placeholders) to make the build work.
> > > > > >> > >
> > > > > >> > > Adding device trees that are never used sounds like a hack to 
> > > > > >> > > me.
> > > > > >> > >
> > > > > >> > > For QEMU, device tree is dynamically generated on the fly 
> > > > > >> > > based on
> > > > > >> > > command line parameters, and the device tree you put in this 
> > > > > >> > > series
> > > > > >> > > has various hardcoded  values which normally do not 
> > > > > >> > > show up
> > > > > >> > > in hand-written dts files.
> > > > > >> > >
> > > > > >> > > I am not sure I understand the whole point of this.
> > > > > >> >
> > > > > >> > I am also confused and do not like the idea of adding device 
> > > > > >> > trees for
> > > > > >> > platforms that are capable of and can / do have a device tree to 
> > > > > >> > give us
> > > > > >> > at run time.
> > > > > >>
> > > > > >> (I'll just reply to this one email, since the same points applies 
> > > > > >> to
> > > > > >> all replies I think)
> > > > > >>
> > > > > >> I have been thinking about this and discussing it with people for a
> > > > > >> few months now. I've been signalling a change like this for over a
> > > > > >> month now, on U-Boot contributor calls and in discussions with 
> > > > > >> Linaro
> > > > > >> people. I sent a patch (below) to try to explain things. I hope it 
> > > > > >> is
> > > > > >> not a surprise!
> > > > > >>
> > > > > >> The issue here is that we need a devicetree in-tree in U-Boot, to
> > > > > >> avoid the mess that has been created by OF_PRIOR_STAGE, OF_BOARD,
> > > > > >> BINMAN_STANDALONE_FDT and to a lesser extent, OF_HOSTFILE. Between
> > > > > >> Ilias' series and this one we can get ourselves on a stronger 
> > > > > >> footing.
> > > > > >> There is just OF_SEPARATE, with OF_EMBED for debugging/ELF use.
> > > > > >> For more context:
> > > > > >>
> > > > > >> http://patchwork.ozlabs.org/project/uboot/patch/20210919215111.3830278-3-...@chromium.org/
> > > > > >>
> > > > > >> BTW I did suggest to QEMU ARM that they support a way of adding the
> > > > > >> u-boot.dtsi but there was not much interest there (in fact the
> > > > > >> maintainer would prefer

[PATCH v1 10/12] virtio-mem: Fix typo in virito_mem_intersect_memory_section() function name

2021-10-27 Thread David Hildenbrand

It's "virtio", not "virito".

Signed-off-by: David Hildenbrand 
---
 hw/virtio/virtio-mem.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/hw/virtio/virtio-mem.c b/hw/virtio/virtio-mem.c
index d5a578142b..0f5eae4a7e 100644
--- a/hw/virtio/virtio-mem.c
+++ b/hw/virtio/virtio-mem.c
@@ -177,7 +177,7 @@ static int virtio_mem_for_each_unplugged_range(const 
VirtIOMEM *vmem, void *arg,
  *
  * Returns false if the intersection is empty, otherwise returns true.
  */
-static bool virito_mem_intersect_memory_section(MemoryRegionSection *s,
+static bool virtio_mem_intersect_memory_section(MemoryRegionSection *s,
 uint64_t offset, uint64_t size)
 {
 uint64_t start = MAX(s->offset_within_region, offset);
@@ -215,7 +215,7 @@ static int virtio_mem_for_each_plugged_section(const 
VirtIOMEM *vmem,
   first_bit + 1) - 1;
 size = (last_bit - first_bit + 1) * vmem->block_size;
 
-if (!virito_mem_intersect_memory_section(&tmp, offset, size)) {
+if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
 break;
 }
 ret = cb(&tmp, arg);
@@ -247,7 +247,7 @@ static int virtio_mem_for_each_unplugged_section(const 
VirtIOMEM *vmem,
  first_bit + 1) - 1;
 size = (last_bit - first_bit + 1) * vmem->block_size;
 
-if (!virito_mem_intersect_memory_section(&tmp, offset, size)) {
+if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
 break;
 }
 ret = cb(&tmp, arg);
@@ -283,7 +283,7 @@ static void virtio_mem_notify_unplug(VirtIOMEM *vmem, 
uint64_t offset,
 QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
 MemoryRegionSection tmp = *rdl->section;
 
-if (!virito_mem_intersect_memory_section(&tmp, offset, size)) {
+if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
 continue;
 }
 rdl->notify_discard(rdl, &tmp);
@@ -299,7 +299,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t 
offset,
 QLIST_FOREACH(rdl, &vmem->rdl_list, next) {
 MemoryRegionSection tmp = *rdl->section;
 
-if (!virito_mem_intersect_memory_section(&tmp, offset, size)) {
+if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
 continue;
 }
 ret = rdl->notify_populate(rdl, &tmp);
@@ -316,7 +316,7 @@ static int virtio_mem_notify_plug(VirtIOMEM *vmem, uint64_t 
offset,
 if (rdl2 == rdl) {
 break;
 }
-if (!virito_mem_intersect_memory_section(&tmp, offset, size)) {
+if (!virtio_mem_intersect_memory_section(&tmp, offset, size)) {
 continue;
 }
 rdl2->notify_discard(rdl2, &tmp);
-- 
2.31.1

[PATCH v1 04/12] vhost: Don't merge unmergeable memory sections

2021-10-27 Thread David Hildenbrand

Memory sections that are marked unmergeable should not be merged, to
allow for atomic removal later.

Signed-off-by: David Hildenbrand 
---
 hw/virtio/vhost.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 2707972870..49a1074097 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -620,7 +620,7 @@ static void vhost_region_add_section(struct vhost_dev *dev,
mrs_size, mrs_host);
 }
 
-if (dev->n_tmp_sections) {
+if (dev->n_tmp_sections && !section->unmergeable) {
 /* Since we already have at least one section, lets see if
  * this extends it; since we're scanning in order, we only
  * have to look at the last one, and the FlatView that calls
@@ -653,7 +653,7 @@ static void vhost_region_add_section(struct vhost_dev *dev,
 size_t offset = mrs_gpa - prev_gpa_start;
 
 if (prev_host_start + offset == mrs_host &&
-section->mr == prev_sec->mr &&
+section->mr == prev_sec->mr && !prev_sec->unmergeable &&
 (!dev->vhost_ops->vhost_backend_can_merge ||
  dev->vhost_ops->vhost_backend_can_merge(dev,
 mrs_host, mrs_size,
-- 
2.31.1

Re: [PATCH v2 1/3] file-posix: add `aio-max-batch` option

2021-10-27 Thread Kevin Wolf

Am 27.10.2021 um 11:23 hat Stefano Garzarella geschrieben:
> On Wed, Oct 27, 2021 at 06:28:51AM +0200, Markus Armbruster wrote:
> > Stefano Garzarella  writes:
> > 
> > > Commit d7ddd0a161 ("linux-aio: limit the batch size using
> > > `aio-max-batch` parameter") added a way to limit the batch size
> > > of Linux AIO backend for the entire AIO context.
> > > 
> > > The same AIO context can be shared by multiple devices, so
> > > latency-sensitive devices may want to limit the batch size even
> > > more to avoid increasing latency.
> > > 
> > > For this reason we add the `aio-max-batch` option to the file
> > > backend, which will be used by the next commits to limit the size of
> > > batches including requests generated by this device.
> > > 
> > > Suggested-by: Kevin Wolf 
> > > Reviewed-by: Kevin Wolf 
> > > Signed-off-by: Stefano Garzarella 
> > > ---
> > > 
> > > Notes:
> > > v2:
> > > - @aio-max-batch documentation rewrite [Stefan, Kevin]
> > > 
> > >  qapi/block-core.json | 7 +++
> > >  block/file-posix.c   | 9 +
> > >  2 files changed, 16 insertions(+)
> > > 
> > > diff --git a/qapi/block-core.json b/qapi/block-core.json
> > > index 6d3217abb6..fef76b0ea2 100644
> > > --- a/qapi/block-core.json
> > > +++ b/qapi/block-core.json
> > > @@ -2896,6 +2896,12 @@
> > >  #  for this device (default: none, forward the commands via 
> > > SG_IO;
> > >  #  since 2.11)
> > >  # @aio: AIO backend (default: threads) (since: 2.8)
> > > +# @aio-max-batch: maximum number of requests to batch together into a 
> > > single
> > > +# submission in the AIO backend. The smallest value 
> > > between
> > > +# this and the aio-max-batch value of the IOThread 
> > > object is
> > > +# chosen.
> > > +# 0 means that the AIO backend will handle it 
> > > automatically.
> > > +# (default: 0, since 6.2)
> > 
> > "(default 0) (since 6.2)" seems to be more common.
> 
> Indeed I wasn't sure, so I followed @drop-cache, the last one added in
> @BlockdevOptionsFile.

Actually, I think your style is more common, both globally and in
block-*:

$ git grep -i '[,;] since' qapi/ | wc -l
17
$ git grep -i '[,;] since' qapi/block* | wc -l
12

Compared to:

$ git grep -i ') (since' qapi/ | wc -l
14
$ git grep -i ') (since' qapi/block* | wc -l
7

Also a few instances with "(since: ...; default: ...)", but none in that
order with separate brackets.

So I'd rather merge this version if this is the only comment.

Kevin

Re: [PATCH v5 6/8] iotests/300: avoid abnormal shutdown race condition

2021-10-27 Thread Kevin Wolf

Am 26.10.2021 um 19:56 hat John Snow geschrieben:
> Wait for the destination VM to close itself instead of racing to shut it
> down first, which produces different error log messages from AQMP
> depending on precisely when we tried to shut it down.
> 
> (For example: We may try to issue 'quit' immediately prior to the target
> VM closing its QMP socket, which will cause an ECONNRESET error to be
> logged. Waiting for the VM to exit itself avoids the race on shutdown
> behavior.)
> 
> Reported-by: Hanna Reitz 
> Signed-off-by: John Snow 

I think this will still expose the race I described in my comment on
patch 2.

Kevin

[PATCH v1 03/12] memory: Allow for marking memory region aliases unmergeable

2021-10-27 Thread David Hildenbrand

Let's allow for marking memory region aliases unmergeable, to teach
flatview code (and memory listeners like vhost-user) to not merge adjacent
aliases to the same memory region into a larger memory section; instead, we
want separate alias to stay separate such that we can atomically map/unmap
aliases without affecting other aliases.

This is a preparation for virtio-mem mapping device memory located
on a RAM memory region via separate aliases into a memory region container,
resulting in separate memslots that can get (un)mapped atomically.

As an example with virtio-mem, the layout looks something like this:
  [...]
  00018000-02007fff (prio 0, i/o): device-memory
 00018000-01017fff (prio 0, i/o): virtio-mem-memslots
   00018000-0001bfff (prio 0, ram): alias 
virtio-mem-memslot-0 @mem0 -3fff
   0001c000-0001 (prio 0, ram): alias 
virtio-mem-memslot-1 @mem0 4000-7fff
   0002-00023fff (prio 0, ram): alias 
virtio-mem-memslot-2 @mem0 8000-bfff
  [...]

What would happen right now is that flatview code merged all 3 aliases
into a single memory section. When mapping another alias (e.g.,
virtio-mem-memslot-3) or when unmapping any of the mapped aliases,
memory listeners will first get notified about the removal of the big
memory section to then get notified about re-adding of the new
(differently merged) memory section(s).

In an ideal world, memory listeners would be able to deal with that
atomically, however, that is not the case for the most important memory
listeners used in context of virtio-mem (KVM, vhost-user, vfio) and
supporting atomic updates is quite hard (e.g., for KVM where we cannot
simply resize or split memory slots due to allocated metadata per slot,
or in virtiofsd where we cannot simply resize or split an active mmap
mapping). While temporarily removing memslots, active users (e.g., KVM
VCPUs) can stumble over the missing memslot and essentially crash the
VM.

Further, merged chunks will consume less memslots, but we might end up
consuming more later, when unmapping chunks and splitting the bigger
chunks into smaller ones -- making memslot accounting for memory devices
problematic as well.

Let's allow for marking a memory region alias unmergeable, such that we
can atomically (un)map aliases to the same memory region, similar to
(un)mapping individual DIMMs.

Signed-off-by: David Hildenbrand 
---
 include/exec/memory.h | 23 +++
 softmmu/memory.c  | 33 +++--
 2 files changed, 50 insertions(+), 6 deletions(-)

diff --git a/include/exec/memory.h b/include/exec/memory.h
index 75b4f600e3..d877b80e6e 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -82,6 +82,7 @@ struct ReservedRegion {
  * relative to the region's address space
  * @readonly: writes to this section are ignored
  * @nonvolatile: this section is non-volatile
+ * @unmergeable: this section should not get merged with adjacent sections
  */
 struct MemoryRegionSection {
 Int128 size;
@@ -91,6 +92,7 @@ struct MemoryRegionSection {
 hwaddr offset_within_address_space;
 bool readonly;
 bool nonvolatile;
+bool unmergeable;
 };
 
 typedef struct IOMMUTLBEntry IOMMUTLBEntry;
@@ -720,6 +722,7 @@ struct MemoryRegion {
 bool nonvolatile;
 bool rom_device;
 bool flush_coalesced_mmio;
+bool unmergeable;
 uint8_t dirty_log_mask;
 bool is_iommu;
 RAMBlock *ram_block;
@@ -2272,6 +2275,26 @@ void memory_region_set_size(MemoryRegion *mr, uint64_t 
size);
 void memory_region_set_alias_offset(MemoryRegion *mr,
 hwaddr offset);
 
+/*
+ * memory_region_set_alias_unmergeable: Turn a memory region alias unmergeable
+ *
+ * Mark a memory region alias unmergeable, resulting in multiple adjacent
+ * aliasas to the same memory region not getting merged into one memory section
+ * when simplifying the address space and notifying memory listeners.
+ *
+ * Primarily useful on aliases to RAM regions; the target use case is
+ * splitting a RAM memory region via aliases into multiple memslots and
+ * dynamically (un)mapping the aliases into another container memory region.
+ * As resulting memory sections don't cover multiple aliases, memory listeners
+ * will be notified about adding/removing separate aliases, resulting in
+ * individual memslots in KVM, vhost, vfio,... that can be added/removed
+ * atomically when mapping/unmapping the corresponding alias.
+ *
+ * @mr: the #MemoryRegion to be updated
+ * @unmergeable: whether to mark the #MemoryRegion unmergeable
+ */
+void memory_region_set_alias_unmergeable(MemoryRegion *mr, bool unmergeable);
+
 /**
  * memory_region_present: checks if an address relative to a @container
  * translates into #MemoryRegion within @container
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 3a261

Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-27 Thread Tom Rini

On Tue, Oct 26, 2021 at 09:46:38AM +0300, Ilias Apalodimas wrote:
> Hi Simon,
> 
> A bit late to the party, sorry!
> 
> [...]
> 
> > >
> > > I really want to see what the binary case looks like since we could then
> > > kill off rpi_{3,3_b,4}_defconfig and I would need to see if we could
> > > then also do a rpi_arm32_defconfig too.
> > >
> > > I want to see less device trees in U-Boot sources, if they can come
> > > functionally correct from the hardware/our caller.
> > >
> > > And I'm not seeing how we make use of "U-Boot /config" if we also don't
> > > use the device tree from build time at run time, ignoring the device
> > > tree provided to us at run time by the caller.
> > 
> > Firstly I should say that I find building firmware very messy and
> > confusing these days. Lots of things to build and it's hard to find
> > the instructions. It doesn't have to be that way, but if we carry on
> > as we are, it will continue to be messy and in five years you will
> > need a Ph.D and a lucky charm to boot on any modern board. My
> > objective here is to simplify things, bringing some consistency to the
> > different components. Binman was one effort there. I feel that putting
> > at least the U-Boot house in order, in my role as devicetree
> > maintainer (and as author of devicetree support in U-Boot back in
> > 2011), is the next step.
> > 
> > If we set things up correctly and agree on the bindings, devicetree
> > can be the unifying configuration mechanism through the whole of
> > firmware (except for very early bits) and into the OS, this will set
> > us up very well to deal with the complexity that is coming.
> > 
> > Anyway, here are the mental steps that I've gone through over the past
> > two months:
> > 
> > Step 1: At present, some people think U-Boot is not even allowed to
> > have its own nodes/properties in the DT. It is an abuse of the
> > devicetree standard, like the /chosen node but with less history. We
> > should sacrifice efficiency, expedience and expandability on the altar
> > of 'devicetree is a hardware description'. How do we get over that
> > one? Wel, I just think we need to accept that U-Boot uses devicetree
> > for its own purposes, as well as for booting the OS. I am not saying
> > it always has to have those properties, but with existing features
> > like verified boot, SPL as well as complex firmware images where
> > U-Boot needs to be able to find things in the image, it is essential.
> > So let's just assume that we need this everywhere, since we certainly
> > need it in at least some places.
> > 
> > (stop reading here if you disagree, because nothing below will make
> > any sense...you can still use U-Boot v2011.06 which doesn't have
> > OF_CONTROL :-)
> 
> Having U-Boot keep it's *internal* config state in DTs is fine.  Adding
> that to the DTs that are copied over from linux isn't imho.  There are
> various reasons for that.  First of all syncing device trees is a huge pain
> and that's probably one of the main reasons our DTs are out of sync for a
> large number of boards.

This re-sync is only a pain because:
1. Some platforms have been modifying the core dts files LIKE THEY ARE
   NOT SUPPOSED TO.
2. DTS files are getting closer to being the super stable API that has
   been promised now that there's validation tools.

Some SoCs, like stm32 are doing an amazing job and keeping things in
sync, every release.  Others like NXP are violating rule #1.  Still
others like some TI platforms get bit by #2 (I solved one of these, and
need to cycle back to the one you and I talked about on IRC a while
back, I bet it's another node name dash changed to underbar).

> The point is this was fine in 2011 were we had SPL only,  but the reality
> today is completely different.  There's previous stage boot loaders (and
> enough cases were vendors prefer those over SPL).  If that bootloader needs
> to use it's own device tree for whatever reason,  imposing restrictions on
> it wrt to the device tree it has to include,  and require them to have
> knowledge of U-Boot and it's internal config mechanism makes no sense not
> to mention it doesn't scale at all.

If you are passing the full device tree around, a few more
nodes/properties aren't going to make the situation worse.  If we're
talking about a 60 kilobyte blob one more kilobyte isn't where we call
the line, especially since if we wait another 6 months it'll be a 62
kilobyte file coming in from Linux instead.

> > Step 2: Assume U-Boot has its own nodes/properties. How do they get
> > there? Well, we have u-boot.dtsi files for that (the 2016 patch
> > "6d427c6b1fa binman: Automatically include a U-Boot .dtsi file"), we
> > have binman definitions, etc. So we need a way to overlay those things
> > into the DT. We already support this for in-tree DTs, so IMO this is
> > easy. Just require every board to have an in-tree DT. It helps with
> > discoverability and documentation, anyway. That is this series.
> 
> Again, the board might decide

[PATCH v1 02/12] vhost: Return number of free memslots

2021-10-27 Thread David Hildenbrand

Let's return the number of free slots instead of only checking if there
is a free slot. Required to support memory devices that consume multiple
memslots.

Signed-off-by: David Hildenbrand 
---
 hw/mem/memory-device.c| 2 +-
 hw/virtio/vhost-stub.c| 2 +-
 hw/virtio/vhost.c | 4 ++--
 include/hw/virtio/vhost.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/mem/memory-device.c b/hw/mem/memory-device.c
index 9045ead33e..7f76a09e57 100644
--- a/hw/mem/memory-device.c
+++ b/hw/mem/memory-device.c
@@ -77,7 +77,7 @@ static void memory_device_check_addable(MachineState *ms, 
uint64_t size,
 error_setg(errp, "hypervisor has no free memory slots left");
 return;
 }
-if (!vhost_has_free_slot()) {
+if (!vhost_get_free_memslots()) {
 error_setg(errp, "a used vhost backend has no free memory slots left");
 return;
 }
diff --git a/hw/virtio/vhost-stub.c b/hw/virtio/vhost-stub.c
index c175148fce..fe111e5e45 100644
--- a/hw/virtio/vhost-stub.c
+++ b/hw/virtio/vhost-stub.c
@@ -2,7 +2,7 @@
 #include "hw/virtio/vhost.h"
 #include "hw/virtio/vhost-user.h"
 
-bool vhost_has_free_slot(void)
+unsigned int vhost_get_free_memslots(void)
 {
 return true;
 }
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 437347ad01..2707972870 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -48,7 +48,7 @@ static unsigned int used_memslots;
 static QLIST_HEAD(, vhost_dev) vhost_devices =
 QLIST_HEAD_INITIALIZER(vhost_devices);
 
-bool vhost_has_free_slot(void)
+unsigned int vhost_get_free_memslots(void)
 {
 unsigned int slots_limit = ~0U;
 struct vhost_dev *hdev;
@@ -57,7 +57,7 @@ bool vhost_has_free_slot(void)
 unsigned int r = hdev->vhost_ops->vhost_backend_memslots_limit(hdev);
 slots_limit = MIN(slots_limit, r);
 }
-return slots_limit > used_memslots;
+return slots_limit - used_memslots;
 }
 
 static void vhost_dev_sync_region(struct vhost_dev *dev,
diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index 3fa0b554ef..9d59fc1404 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -130,7 +130,7 @@ uint64_t vhost_get_features(struct vhost_dev *hdev, const 
int *feature_bits,
 uint64_t features);
 void vhost_ack_features(struct vhost_dev *hdev, const int *feature_bits,
 uint64_t features);
-bool vhost_has_free_slot(void);
+unsigned int vhost_get_free_memslots(void);
 
 int vhost_net_set_backend(struct vhost_dev *hdev,
   struct vhost_vring_file *file);
-- 
2.31.1

[PATCH 3/3] watchdog: remove select_watchdog_action

2021-10-27 Thread Paolo Bonzini

Instead of invoking select_watchdog_action from both HMP and command line,
go directly from HMP to QMP and use QemuOpts as the intermediary for the
command line.

This makes -watchdog-action explicitly a shortcut for "-action watchdog",
so that "-watchdog-action" and "-action watchdog" override each other
based on the position on the command line; previously, "-action watchdog"
always won.

Signed-off-by: Paolo Bonzini 
---
 hw/watchdog/watchdog.c| 14 --
 include/sysemu/watchdog.h |  1 -
 monitor/misc.c| 15 ---
 softmmu/vl.c  | 10 +-
 4 files changed, 17 insertions(+), 23 deletions(-)

diff --git a/hw/watchdog/watchdog.c b/hw/watchdog/watchdog.c
index 0e98ffb73f..1437e6c5b6 100644
--- a/hw/watchdog/watchdog.c
+++ b/hw/watchdog/watchdog.c
@@ -76,20 +76,6 @@ int select_watchdog(const char *p)
 return 1;
 }
 
-int select_watchdog_action(const char *p)
-{
-int action;
-char *qapi_value;
-
-qapi_value = g_ascii_strdown(p, -1);
-action = qapi_enum_parse(&WatchdogAction_lookup, qapi_value, -1, NULL);
-g_free(qapi_value);
-if (action < 0)
-return -1;
-qmp_watchdog_set_action(action, &error_abort);
-return 0;
-}
-
 WatchdogAction get_watchdog_action(void)
 {
 return watchdog_action;
diff --git a/include/sysemu/watchdog.h b/include/sysemu/watchdog.h
index a08d16380d..d2d4901dbb 100644
--- a/include/sysemu/watchdog.h
+++ b/include/sysemu/watchdog.h
@@ -37,7 +37,6 @@ typedef struct WatchdogTimerModel WatchdogTimerModel;
 
 /* in hw/watchdog.c */
 int select_watchdog(const char *p);
-int select_watchdog_action(const char *action);
 WatchdogAction get_watchdog_action(void);
 void watchdog_add_model(WatchdogTimerModel *model);
 void watchdog_perform_action(void);
diff --git a/monitor/misc.c b/monitor/misc.c
index ffe7966870..f411675e32 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -70,6 +70,7 @@
 #include "qapi/qapi-commands-migration.h"
 #include "qapi/qapi-commands-misc.h"
 #include "qapi/qapi-commands-qom.h"
+#include "qapi/qapi-commands-run-state.h"
 #include "qapi/qapi-commands-trace.h"
 #include "qapi/qapi-init-commands.h"
 #include "qapi/error.h"
@@ -470,10 +471,18 @@ static void hmp_gdbserver(Monitor *mon, const QDict 
*qdict)
 
 static void hmp_watchdog_action(Monitor *mon, const QDict *qdict)
 {
-const char *action = qdict_get_str(qdict, "action");
-if (select_watchdog_action(action) == -1) {
-monitor_printf(mon, "Unknown watchdog action '%s'\n", action);
+Error *err = NULL;
+WatchdogAction action;
+char *qapi_value;
+
+qapi_value = g_ascii_strdown(qdict_get_str(qdict, "action"), -1);
+action = qapi_enum_parse(&WatchdogAction_lookup, qapi_value, -1, &err);
+g_free(qapi_value);
+if (err) {
+hmp_handle_error(mon, err);
+return;
 }
+qmp_watchdog_set_action(action, &error_abort);
 }
 
 static void monitor_printc(Monitor *mon, int c)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 33908860e7..cf01bd5084 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3225,12 +3225,12 @@ void qemu_init(int argc, char **argv, char **envp)
  exit(1);
 }
 break;
-case QEMU_OPTION_watchdog_action:
-if (select_watchdog_action(optarg) == -1) {
-error_report("unknown -watchdog-action parameter");
-exit(1);
-}
+case QEMU_OPTION_watchdog_action: {
+QemuOpts *opts;
+opts = qemu_opts_create(qemu_find_opts("action"), NULL, 0, 
&error_abort);
+qemu_opt_set(opts, "watchdog", optarg, &error_abort);
 break;
+}
 case QEMU_OPTION_parallel:
 add_device_config(DEV_PARALLEL, optarg);
 default_parallel = 0;
-- 
2.31.1

Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-27 Thread François Ozog

Hi,

On Wed, 27 Oct 2021 at 14:48, Tom Rini  wrote:

> On Fri, Oct 15, 2021 at 12:03:44PM -0600, Simon Glass wrote:
> > Hi all,
> >
> > On Thu, 14 Oct 2021 at 09:28, Tom Rini  wrote:
> > >
> > > On Thu, Oct 14, 2021 at 09:17:52AM -0600, Simon Glass wrote:
> > > > Hi Tom,
> > > >
> > > > On Thu, 14 Oct 2021 at 08:56, Tom Rini  wrote:
> > > > >
> > > > > On Wed, Oct 13, 2021 at 12:06:02PM -0600, Simon Glass wrote:
> > > > > > Hi François,
> > > > > >
> > > > > > On Wed, 13 Oct 2021 at 11:35, François Ozog <
> francois.o...@linaro.org> wrote:
> > > > > > >
> > > > > > > Hi Simon
> > > > > > >
> > > > > > > Le mer. 13 oct. 2021 à 16:49, Simon Glass 
> a écrit :
> > > > > > >>
> > > > > > >> Hi Tom, Bin,François,
> > > > > > >>
> > > > > > >> On Tue, 12 Oct 2021 at 19:34, Tom Rini 
> wrote:
> > > > > > >> >
> > > > > > >> > On Wed, Oct 13, 2021 at 09:29:14AM +0800, Bin Meng wrote:
> > > > > > >> > > Hi Simon,
> > > > > > >> > >
> > > > > > >> > > On Wed, Oct 13, 2021 at 9:01 AM Simon Glass <
> s...@chromium.org> wrote:
> > > > > > >> > > >
> > > > > > >> > > > With Ilias' efforts we have dropped OF_PRIOR_STAGE and
> OF_HOSTFILE so
> > > > > > >> > > > there are only three ways to obtain a devicetree:
> > > > > > >> > > >
> > > > > > >> > > >- OF_SEPARATE - the normal way, where the devicetree
> is built and
> > > > > > >> > > >   appended to U-Boot
> > > > > > >> > > >- OF_EMBED - for development purposes, the
> devicetree is embedded in
> > > > > > >> > > >   the ELF file (also used for EFI)
> > > > > > >> > > >- OF_BOARD - the board figures it out on its own
> > > > > > >> > > >
> > > > > > >> > > > The last one is currently set up so that no devicetree
> is needed at all
> > > > > > >> > > > in the U-Boot tree. Most boards do provide one, but
> some don't. Some
> > > > > > >> > > > don't even provide instructions on how to boot on the
> board.
> > > > > > >> > > >
> > > > > > >> > > > The problems with this approach are documented at [1].
> > > > > > >> > > >
> > > > > > >> > > > In practice, OF_BOARD is not really distinct from
> OF_SEPARATE. Any board
> > > > > > >> > > > can obtain its devicetree at runtime, even it is has a
> devicetree built
> > > > > > >> > > > in U-Boot. This is because U-Boot may be a second-stage
> bootloader and its
> > > > > > >> > > > caller may have a better idea about the hardware
> available in the machine.
> > > > > > >> > > > This is the case with a few QEMU boards, for example.
> > > > > > >> > > >
> > > > > > >> > > > So it makes no sense to have OF_BOARD as a 'choice'. It
> should be an
> > > > > > >> > > > option, available with either OF_SEPARATE or OF_EMBED.
> > > > > > >> > > >
> > > > > > >> > > > This series makes this change, adding various missing
> devicetree files
> > > > > > >> > > > (and placeholders) to make the build work.
> > > > > > >> > >
> > > > > > >> > > Adding device trees that are never used sounds like a
> hack to me.
> > > > > > >> > >
> > > > > > >> > > For QEMU, device tree is dynamically generated on the fly
> based on
> > > > > > >> > > command line parameters, and the device tree you put in
> this series
> > > > > > >> > > has various hardcoded  values which normally do
> not show up
> > > > > > >> > > in hand-written dts files.
> > > > > > >> > >
> > > > > > >> > > I am not sure I understand the whole point of this.
> > > > > > >> >
> > > > > > >> > I am also confused and do not like the idea of adding
> device trees for
> > > > > > >> > platforms that are capable of and can / do have a device
> tree to give us
> > > > > > >> > at run time.
> > > > > > >>
> > > > > > >> (I'll just reply to this one email, since the same points
> applies to
> > > > > > >> all replies I think)
> > > > > > >>
> > > > > > >> I have been thinking about this and discussing it with people
> for a
> > > > > > >> few months now. I've been signalling a change like this for
> over a
> > > > > > >> month now, on U-Boot contributor calls and in discussions
> with Linaro
> > > > > > >> people. I sent a patch (below) to try to explain things. I
> hope it is
> > > > > > >> not a surprise!
> > > > > > >>
> > > > > > >> The issue here is that we need a devicetree in-tree in
> U-Boot, to
> > > > > > >> avoid the mess that has been created by OF_PRIOR_STAGE,
> OF_BOARD,
> > > > > > >> BINMAN_STANDALONE_FDT and to a lesser extent, OF_HOSTFILE.
> Between
> > > > > > >> Ilias' series and this one we can get ourselves on a stronger
> footing.
> > > > > > >> There is just OF_SEPARATE, with OF_EMBED for debugging/ELF
> use.
> > > > > > >> For more context:
> > > > > > >>
> > > > > > >>
> http://patchwork.ozlabs.org/project/uboot/patch/20210919215111.3830278-3-...@chromium.org/
> > > > > > >>
> > > > > > >> BTW I did suggest to QEMU ARM that they support a way of
> adding the
> > > > > > >> u-boot.dtsi but there was not much interest there (in fact the
> > > > > > >> maintainer would prefer there was no special support even for
> booting
> > > > > > >> Linux directl

Re: [PATCH 0/1] gitlab-ci: Only push docker images to mainstream registry from /master

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/27/21 14:34, Daniel P. Berrangé wrote:
> On Wed, Oct 27, 2021 at 02:26:58PM +0200, Gerd Hoffmann wrote:
>>> Notably the latter is more restrictive that git branch names. We could
>>> assume users always have "sensible" branch names that are less than
>>> 128 chars and only alpha-num characters plus dash/underscore. This
>>> would be fine for my personal branch naming, but I wonder if anyone
>>> uses wierd branch names that would violate docker tag name rules ?
>>
>> /me uses slashes in branch names, i.e.
>>
>> queue/$topic for patch queues
>> $hostname/$subject for my development branches.
> 
> Ok, good to know. So that example clearly rules out use of git branch
> names as docker tags.

CI_COMMIT_REF_SLUG should work:

- CI_COMMIT_REF_NAME The branch or tag name for which project is built.

- CI_COMMIT_REF_NAME in lowercase, shortened to 63 bytes, and with
  everything except 0-9 and a-z replaced with -.
  No leading / trailing -. Use in URLs, host names and domain names.

[PULL 5/8] fsdev/p9array.h: check scalar type in P9ARRAY_NEW()

2021-10-27 Thread Christian Schoenebeck

Make sure at compile time that the scalar type of the array
requested to be created via P9ARRAY_NEW() matches the scalar
type of the passed auto reference variable (unique pointer).

Suggested-by: Richard Henderson 
Signed-off-by: Christian Schoenebeck 
Message-Id: 

---
 fsdev/p9array.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/fsdev/p9array.h b/fsdev/p9array.h
index fff946a3d7..6aa25327ca 100644
--- a/fsdev/p9array.h
+++ b/fsdev/p9array.h
@@ -27,6 +27,8 @@
 #ifndef QEMU_P9ARRAY_H
 #define QEMU_P9ARRAY_H
 
+#include "qemu/compiler.h"
+
 /**
  * P9Array provides a mechanism to access arrays in common C-style (e.g. by
  * square bracket [] operator) in conjunction with reference variables that
@@ -149,6 +151,10 @@
  * @param len - amount of array elements to be allocated immediately
  */
 #define P9ARRAY_NEW(scalar_type, auto_var, len) \
+QEMU_BUILD_BUG_MSG( \
+!__builtin_types_compatible_p(scalar_type, typeof(*auto_var)), \
+"P9Array scalar type mismatch" \
+); \
 p9array_new_##scalar_type((&auto_var), len)
 
 #endif /* QEMU_P9ARRAY_H */
-- 
2.20.1

[PATCH v2] vhost-vdpa: Set discarding of RAM broken when initializing the backend

2021-10-27 Thread David Hildenbrand

Similar to VFIO, vDPA will go ahead an map+pin all guest memory. Memory
that used to be discarded will get re-populated and if we
discard+re-access memory after mapping+pinning, the pages mapped into the
vDPA IOMMU will go out of sync with the actual pages mapped into the user
space page tables.

Set discarding of RAM broken such that:
- virtio-mem and vhost-vdpa run mutually exclusive
- virtio-balloon is inhibited and no memory discards will get issued

In the future, we might be able to support coordinated discarding of RAM
as used by virtio-mem and already supported by vfio via the
RamDiscardManager.

Acked-by: Jason Wang 
Cc: Jason Wang 
Cc: Michael S. Tsirkin 
Cc: Cindy Lu 
Signed-off-by: David Hildenbrand 
---

Unfortuantely, v1 never got picked up no matter how "hard" I tried to
ping :)

v1 -> v2:
- Add ack from Jason
- Extend patch description now that RamDiscardManager is upstream.

---
 hw/virtio/vhost-vdpa.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 12661fd5b1..0d8051426c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -331,6 +331,17 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void 
*opaque, Error **errp)
 struct vhost_vdpa *v;
 assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
 trace_vhost_vdpa_init(dev, opaque);
+int ret;
+
+/*
+ * Similar to VFIO, we end up pinning all guest memory and have to
+ * disable discarding of RAM.
+ */
+ret = ram_block_discard_disable(true);
+if (ret) {
+error_report("Cannot set discarding of RAM broken");
+return ret;
+}
 
 v = opaque;
 v->dev = dev;
@@ -442,6 +453,8 @@ static int vhost_vdpa_cleanup(struct vhost_dev *dev)
 memory_listener_unregister(&v->listener);
 
 dev->opaque = NULL;
+ram_block_discard_disable(false);
+
 return 0;
 }
 
-- 
2.31.1

[PATCH 0/3] Deprecate -watchdog, cleanup -watchdog-action

2021-10-27 Thread Paolo Bonzini

-watchdog is a simple wrapper around -device, but it includes quite
a bit of code to register and list available watchdog devices.
Add a category for watchdogs so that they are listed as such in
'-device help', and deprecate it.

For -watchdog-action, make it 100% a shortcut for "-action watchdog",
instead of calling into watchdog.c from the command line parser.
This is a small improvement because "-watchdog-action" and "-action
watchdog" now override each other based on the position on the
command line; previously, "-action watchdog" always won.

Paolo Bonzini (3):
  watchdog: add information from -watchdog help to -device help
  vl: deprecate -watchdog
  watchdog: remove select_watchdog_action

 docs/about/deprecated.rst  |  5 +
 hw/watchdog/sbsa_gwdt.c|  3 ++-
 hw/watchdog/watchdog.c | 14 --
 hw/watchdog/wdt_aspeed.c   |  3 ++-
 hw/watchdog/wdt_diag288.c  |  3 ++-
 hw/watchdog/wdt_i6300esb.c |  3 ++-
 hw/watchdog/wdt_ib700.c|  3 ++-
 hw/watchdog/wdt_imx2.c |  4 ++--
 include/hw/qdev-core.h |  1 +
 include/sysemu/watchdog.h  |  1 -
 monitor/misc.c | 15 ---
 softmmu/qdev-monitor.c |  1 +
 softmmu/vl.c   | 11 ++-
 13 files changed, 37 insertions(+), 30 deletions(-)

-- 
2.31.1

RE: [PATCH v4 36/51] tcg/optimize: Split out fold_xi_to_x

2021-10-27 Thread Luis Fernando Fujita Pires

From: Richard Henderson 
> Pull the "op r, a, i => mov r, a" optimization into a function, and use them 
> in the
> outer-most logical operations.
> 
> Signed-off-by: Richard Henderson 
> ---
>  tcg/optimize.c | 61 +-
>  1 file changed, 26 insertions(+), 35 deletions(-)

Reviewed-by: Luis Pires 

--
Luis Pires
Instituto de Pesquisas ELDORADO
Aviso Legal - Disclaimer

[PATCH 2/3] vl: deprecate -watchdog

2021-10-27 Thread Paolo Bonzini

-watchdog is the same as -device except that it is case insensitive (and it
allows only watchdog devices of course).  Now that "-device help" can list
as such the available watchdog devices, we can deprecate it.

Note that even though -watchdog tries to be case insensitive, it fails
at that: "-watchdog i6300xyz" fails with "Unknown -watchdog device",
but "-watchdog i6300ESB" also fails (when the generated -device option
is processed) with an error "'i6300ESB' is not a valid device model name".
For this reason, the documentation update does not mention the case
insensitivity of -watchdog.

Signed-off-by: Paolo Bonzini 
---
 docs/about/deprecated.rst | 5 +
 softmmu/vl.c  | 1 +
 2 files changed, 6 insertions(+)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 0bed6ecb1d..3540f5754b 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -160,6 +160,11 @@ Use ``-display sdl`` instead.
 
 Use ``-display curses`` instead.
 
+``-watchdog`` (since 6.2)
+'
+
+Use ``-device`` instead.
+
 ``-smp`` ("parameter=0" SMP configurations) (since 6.2)
 '''
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 55ab70eb97..33908860e7 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -3216,6 +3216,7 @@ void qemu_init(int argc, char **argv, char **envp)
 error_report("only one watchdog option may be given");
 exit(1);
 }
+warn_report("-watchdog is deprecated; use -device instead.");
 watchdog = optarg;
 break;
 case QEMU_OPTION_action:
-- 
2.31.1

[PATCH 1/3] watchdog: add information from -watchdog help to -device help

2021-10-27 Thread Paolo Bonzini

List all watchdog devices in a separate category, and populate
their descriptions.

Signed-off-by: Paolo Bonzini 
---
 hw/watchdog/sbsa_gwdt.c| 3 ++-
 hw/watchdog/wdt_aspeed.c   | 3 ++-
 hw/watchdog/wdt_diag288.c  | 3 ++-
 hw/watchdog/wdt_i6300esb.c | 3 ++-
 hw/watchdog/wdt_ib700.c| 3 ++-
 hw/watchdog/wdt_imx2.c | 4 ++--
 include/hw/qdev-core.h | 1 +
 softmmu/qdev-monitor.c | 1 +
 8 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/hw/watchdog/sbsa_gwdt.c b/hw/watchdog/sbsa_gwdt.c
index d0998f8489..e49cacd0e2 100644
--- a/hw/watchdog/sbsa_gwdt.c
+++ b/hw/watchdog/sbsa_gwdt.c
@@ -273,8 +273,9 @@ static void wdt_sbsa_gwdt_class_init(ObjectClass *klass, 
void *data)
 dc->realize = wdt_sbsa_gwdt_realize;
 dc->reset = wdt_sbsa_gwdt_reset;
 dc->hotpluggable = false;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_WATCHDOG, dc->categories);
 dc->vmsd = &vmstate_sbsa_gwdt;
+dc->desc = "SBSA-compliant generic watchdog device";
 }
 
 static const TypeInfo wdt_sbsa_gwdt_info = {
diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
index 146ffcd713..6aa6f90b66 100644
--- a/hw/watchdog/wdt_aspeed.c
+++ b/hw/watchdog/wdt_aspeed.c
@@ -293,9 +293,10 @@ static void aspeed_wdt_class_init(ObjectClass *klass, void 
*data)
 dc->desc = "ASPEED Watchdog Controller";
 dc->realize = aspeed_wdt_realize;
 dc->reset = aspeed_wdt_reset;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_WATCHDOG, dc->categories);
 dc->vmsd = &vmstate_aspeed_wdt;
 device_class_set_props(dc, aspeed_wdt_properties);
+dc->desc = "Aspeed watchdog device";
 }
 
 static const TypeInfo aspeed_wdt_info = {
diff --git a/hw/watchdog/wdt_diag288.c b/hw/watchdog/wdt_diag288.c
index e135a4de8b..9e8882a11c 100644
--- a/hw/watchdog/wdt_diag288.c
+++ b/hw/watchdog/wdt_diag288.c
@@ -122,9 +122,10 @@ static void wdt_diag288_class_init(ObjectClass *klass, 
void *data)
 dc->unrealize = wdt_diag288_unrealize;
 dc->reset = wdt_diag288_reset;
 dc->hotpluggable = false;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_WATCHDOG, dc->categories);
 dc->vmsd = &vmstate_diag288;
 diag288->handle_timer = wdt_diag288_handle_timer;
+dc->desc = "diag288 device for s390x platform";
 }
 
 static const TypeInfo wdt_diag288_info = {
diff --git a/hw/watchdog/wdt_i6300esb.c b/hw/watchdog/wdt_i6300esb.c
index 4c52e3bb9e..f99a1c9d29 100644
--- a/hw/watchdog/wdt_i6300esb.c
+++ b/hw/watchdog/wdt_i6300esb.c
@@ -476,7 +476,8 @@ static void i6300esb_class_init(ObjectClass *klass, void 
*data)
 k->class_id = PCI_CLASS_SYSTEM_OTHER;
 dc->reset = i6300esb_reset;
 dc->vmsd = &vmstate_i6300esb;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_WATCHDOG, dc->categories);
+dc->desc = "Intel 6300ESB";
 }
 
 static const TypeInfo i6300esb_info = {
diff --git a/hw/watchdog/wdt_ib700.c b/hw/watchdog/wdt_ib700.c
index 177aaa503f..91d1bdc0da 100644
--- a/hw/watchdog/wdt_ib700.c
+++ b/hw/watchdog/wdt_ib700.c
@@ -140,7 +140,8 @@ static void wdt_ib700_class_init(ObjectClass *klass, void 
*data)
 dc->realize = wdt_ib700_realize;
 dc->reset = wdt_ib700_reset;
 dc->vmsd = &vmstate_ib700;
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+set_bit(DEVICE_CATEGORY_WATCHDOG, dc->categories);
+dc->desc = "iBASE 700";
 }
 
 static const TypeInfo wdt_ib700_info = {
diff --git a/hw/watchdog/wdt_imx2.c b/hw/watchdog/wdt_imx2.c
index a5fb76308f..c3128370b5 100644
--- a/hw/watchdog/wdt_imx2.c
+++ b/hw/watchdog/wdt_imx2.c
@@ -280,8 +280,8 @@ static void imx2_wdt_class_init(ObjectClass *klass, void 
*data)
 dc->realize = imx2_wdt_realize;
 dc->reset = imx2_wdt_reset;
 dc->vmsd = &vmstate_imx2_wdt;
-dc->desc = "i.MX watchdog timer";
-set_bit(DEVICE_CATEGORY_MISC, dc->categories);
+dc->desc = "i.MX2 watchdog timer";
+set_bit(DEVICE_CATEGORY_WATCHDOG, dc->categories);
 }
 
 static const TypeInfo imx2_wdt_info = {
diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 4ff19c714b..0dae5b6f6c 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -26,6 +26,7 @@ typedef enum DeviceCategory {
 DEVICE_CATEGORY_SOUND,
 DEVICE_CATEGORY_MISC,
 DEVICE_CATEGORY_CPU,
+DEVICE_CATEGORY_WATCHDOG,
 DEVICE_CATEGORY_MAX
 } DeviceCategory;
 
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 3df99ce9fc..87dd87e801 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -162,6 +162,7 @@ static void qdev_print_devinfos(bool show_no_user)
 [DEVICE_CATEGORY_SOUND]   = "Sound",
 [DEVICE_CATEGORY_MISC]= "Misc",
 [DEVICE_CATEGORY_CPU] = "CPU",
+[DEVICE_CATEGORY_WATCHDOG]= "Watchdog",
 [DEVICE_CATEGORY_MAX] = "Uncategorized",
 };
 GSList *list, *elt;
-- 
2.31.1

Re: [PATCH v1 02/12] vhost: Return number of free memslots

2021-10-27 Thread Philippe Mathieu-Daudé

On 10/27/21 14:45, David Hildenbrand wrote:
> Let's return the number of free slots instead of only checking if there
> is a free slot. Required to support memory devices that consume multiple
> memslots.
> 
> Signed-off-by: David Hildenbrand 
> ---
>  hw/mem/memory-device.c| 2 +-
>  hw/virtio/vhost-stub.c| 2 +-
>  hw/virtio/vhost.c | 4 ++--
>  include/hw/virtio/vhost.h | 2 +-
>  4 files changed, 5 insertions(+), 5 deletions(-)

> --- a/hw/virtio/vhost-stub.c
> +++ b/hw/virtio/vhost-stub.c
> @@ -2,7 +2,7 @@
>  #include "hw/virtio/vhost.h"
>  #include "hw/virtio/vhost-user.h"
>  
> -bool vhost_has_free_slot(void)
> +unsigned int vhost_get_free_memslots(void)
>  {
>  return true;

   return 0;

>  }

[PULL 0/8] 9p queue 2021-10-27

2021-10-27 Thread Christian Schoenebeck

The following changes since commit 931ce30859176f0f7daac6bac255dae5eb21284e:

  Merge remote-tracking branch 'remotes/dagrh/tags/pull-virtiofs-20211026' into 
staging (2021-10-26 07:38:41 -0700)

are available in the Git repository at:

  https://github.com/cschoenebeck/qemu.git tags/pull-9p-20211027

for you to fetch changes up to 7e985780aaab93d2c5be9b62d8d386568dfb071e:

  9pfs: use P9Array in v9fs_walk() (2021-10-27 14:45:22 +0200)


9pfs: performance fix and cleanup

* First patch fixes suboptimal I/O performance on guest due to previously
  incorrect block size being transmitted to 9p client.

* Subsequent patches are cleanup ones intended to reduce code complexity.


Christian Schoenebeck (8):
  9pfs: fix wrong I/O block size in Rgetattr
  9pfs: deduplicate iounit code
  9pfs: simplify blksize_to_iounit()
  9pfs: introduce P9Array
  fsdev/p9array.h: check scalar type in P9ARRAY_NEW()
  9pfs: make V9fsString usable via P9Array API
  9pfs: make V9fsPath usable via P9Array API
  9pfs: use P9Array in v9fs_walk()

 fsdev/9p-marshal.c |   2 +
 fsdev/9p-marshal.h |   3 +
 fsdev/file-op-9p.h |   2 +
 fsdev/p9array.h| 160 +
 hw/9pfs/9p.c   |  70 +--
 5 files changed, 208 insertions(+), 29 deletions(-)
 create mode 100644 fsdev/p9array.h

Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-27 Thread Heinrich Schuchardt


On 10/27/21 15:15, François Ozog wrote:

Hi,

On Wed, 27 Oct 2021 at 14:48, Tom Rini > wrote:


On Fri, Oct 15, 2021 at 12:03:44PM -0600, Simon Glass wrote:
 > Hi all,
 >
 > On Thu, 14 Oct 2021 at 09:28, Tom Rini mailto:tr...@konsulko.com>> wrote:
 > >
 > > On Thu, Oct 14, 2021 at 09:17:52AM -0600, Simon Glass wrote:
 > > > Hi Tom,
 > > >
 > > > On Thu, 14 Oct 2021 at 08:56, Tom Rini mailto:tr...@konsulko.com>> wrote:
 > > > >
 > > > > On Wed, Oct 13, 2021 at 12:06:02PM -0600, Simon Glass wrote:
 > > > > > Hi François,
 > > > > >
 > > > > > On Wed, 13 Oct 2021 at 11:35, François Ozog
mailto:francois.o...@linaro.org>> wrote:
 > > > > > >
 > > > > > > Hi Simon
 > > > > > >
 > > > > > > Le mer. 13 oct. 2021 à 16:49, Simon Glass
mailto:s...@chromium.org>> a écrit :
 > > > > > >>
 > > > > > >> Hi Tom, Bin,François,
 > > > > > >>
 > > > > > >> On Tue, 12 Oct 2021 at 19:34, Tom Rini
mailto:tr...@konsulko.com>> wrote:
 > > > > > >> >
 > > > > > >> > On Wed, Oct 13, 2021 at 09:29:14AM +0800, Bin Meng
wrote:
 > > > > > >> > > Hi Simon,
 > > > > > >> > >
 > > > > > >> > > On Wed, Oct 13, 2021 at 9:01 AM Simon Glass
mailto:s...@chromium.org>> wrote:
 > > > > > >> > > >
 > > > > > >> > > > With Ilias' efforts we have dropped
OF_PRIOR_STAGE and OF_HOSTFILE so
 > > > > > >> > > > there are only three ways to obtain a devicetree:
 > > > > > >> > > >
 > > > > > >> > > >    - OF_SEPARATE - the normal way, where the
devicetree is built and
 > > > > > >> > > >       appended to U-Boot
 > > > > > >> > > >    - OF_EMBED - for development purposes, the
devicetree is embedded in
 > > > > > >> > > >       the ELF file (also used for EFI)
 > > > > > >> > > >    - OF_BOARD - the board figures it out on its own
 > > > > > >> > > >
 > > > > > >> > > > The last one is currently set up so that no
devicetree is needed at all
 > > > > > >> > > > in the U-Boot tree. Most boards do provide one,
but some don't. Some
 > > > > > >> > > > don't even provide instructions on how to boot
on the board.
 > > > > > >> > > >
 > > > > > >> > > > The problems with this approach are documented
at [1].
 > > > > > >> > > >
 > > > > > >> > > > In practice, OF_BOARD is not really distinct
from OF_SEPARATE. Any board
 > > > > > >> > > > can obtain its devicetree at runtime, even it is
has a devicetree built
 > > > > > >> > > > in U-Boot. This is because U-Boot may be a
second-stage bootloader and its
 > > > > > >> > > > caller may have a better idea about the hardware
available in the machine.
 > > > > > >> > > > This is the case with a few QEMU boards, for
example.
 > > > > > >> > > >
 > > > > > >> > > > So it makes no sense to have OF_BOARD as a
'choice'. It should be an
 > > > > > >> > > > option, available with either OF_SEPARATE or
OF_EMBED.
 > > > > > >> > > >
 > > > > > >> > > > This series makes this change, adding various
missing devicetree files
 > > > > > >> > > > (and placeholders) to make the build work.
 > > > > > >> > >
 > > > > > >> > > Adding device trees that are never used sounds
like a hack to me.
 > > > > > >> > >
 > > > > > >> > > For QEMU, device tree is dynamically generated on
the fly based on
 > > > > > >> > > command line parameters, and the device tree you
put in this series
 > > > > > >> > > has various hardcoded  values which
normally do not show up
 > > > > > >> > > in hand-written dts files.
 > > > > > >> > >
 > > > > > >> > > I am not sure I understand the whole point of this.
 > > > > > >> >
 > > > > > >> > I am also confused and do not like the idea of
adding device trees for
 > > > > > >> > platforms that are capable of and can / do have a
device tree to give us
 > > > > > >> > at run time.
 > > > > > >>
 > > > > > >> (I'll just reply to this one email, since the same
points applies to
 > > > > > >> all replies I think)
 > > > > > >>
 > > > > > >> I have been thinking about this and discussing it with
people for a
 > > > > > >> few months now. I've been signalling a change like
this for over a
 > > > > > >> month now, on U-Boot contributor calls and in
discussions with Linaro
 > > > > > >> people. I sent a patch (below) to try to explain
things. I hope it is
 > > > > > >> not a surprise!
 > > > > > >>
 > > > > > >> The issue here is that we need a devicetree in-tree in
U-Boot, to
 > > > > > >> avoid the mess that has been created by
OF_PRIOR_STAGE, OF_BOARD,
 > > > > > >> BINMAN_STANDALONE_FDT and to a lesser extent,
OF_HOSTFILE. Between
 > > > > > >> Ilias' series and this one we can get ourselves on a
stronger footing.
 > > > > > >> Th

[PULL 2/8] 9pfs: deduplicate iounit code

2021-10-27 Thread Christian Schoenebeck

Remove redundant code that translates host fileystem's block
size into 9p client (guest side) block size.

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
Message-Id: 
<129bb71d5119e61d335f1e3107e472e4beea223a.1632758315.git.qemu_...@crudebyte.com>
---
 hw/9pfs/9p.c | 41 -
 1 file changed, 20 insertions(+), 21 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 708b030474..5c57344667 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1262,18 +1262,26 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 #define P9_STATS_ALL   0x3fffULL /* Mask for All fields above */
 
 
-static int32_t stat_to_iounit(const V9fsPDU *pdu, const struct stat *stbuf)
+/**
+ * Convert host filesystem's block size into an appropriate block size for
+ * 9p client (guest OS side). The value returned suggests an "optimum" block
+ * size for 9p I/O, i.e. to maximize performance.
+ *
+ * @pdu: 9p client request
+ * @blksize: host filesystem's block size
+ */
+static int32_t blksize_to_iounit(const V9fsPDU *pdu, int32_t blksize)
 {
 int32_t iounit = 0;
 V9fsState *s = pdu->s;
 
 /*
- * iounit should be multiples of st_blksize (host filesystem block size)
+ * iounit should be multiples of blksize (host filesystem block size)
  * as well as less than (client msize - P9_IOHDRSZ)
  */
-if (stbuf->st_blksize) {
-iounit = stbuf->st_blksize;
-iounit *= (s->msize - P9_IOHDRSZ) / stbuf->st_blksize;
+if (blksize) {
+iounit = blksize;
+iounit *= (s->msize - P9_IOHDRSZ) / blksize;
 }
 if (!iounit) {
 iounit = s->msize - P9_IOHDRSZ;
@@ -1281,6 +1289,11 @@ static int32_t stat_to_iounit(const V9fsPDU *pdu, const 
struct stat *stbuf)
 return iounit;
 }
 
+static int32_t stat_to_iounit(const V9fsPDU *pdu, const struct stat *stbuf)
+{
+return blksize_to_iounit(pdu, stbuf->st_blksize);
+}
+
 static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
 V9fsStatDotl *v9lstat)
 {
@@ -1899,23 +1912,9 @@ out_nofid:
 static int32_t coroutine_fn get_iounit(V9fsPDU *pdu, V9fsPath *path)
 {
 struct statfs stbuf;
-int32_t iounit = 0;
-V9fsState *s = pdu->s;
+int err = v9fs_co_statfs(pdu, path, &stbuf);
 
-/*
- * iounit should be multiples of f_bsize (host filesystem block size
- * and as well as less than (client msize - P9_IOHDRSZ))
- */
-if (!v9fs_co_statfs(pdu, path, &stbuf)) {
-if (stbuf.f_bsize) {
-iounit = stbuf.f_bsize;
-iounit *= (s->msize - P9_IOHDRSZ) / stbuf.f_bsize;
-}
-}
-if (!iounit) {
-iounit = s->msize - P9_IOHDRSZ;
-}
-return iounit;
+return blksize_to_iounit(pdu, (err >= 0) ? stbuf.f_bsize : 0);
 }
 
 static void coroutine_fn v9fs_open(void *opaque)
-- 
2.20.1

[PULL 3/8] 9pfs: simplify blksize_to_iounit()

2021-10-27 Thread Christian Schoenebeck

Use QEMU_ALIGN_DOWN() macro to reduce code and to make it
more human readable.

Suggested-by: Philippe Mathieu-Daudé 
Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: 

---
 hw/9pfs/9p.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index 5c57344667..e874899ef5 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1280,8 +1280,7 @@ static int32_t blksize_to_iounit(const V9fsPDU *pdu, 
int32_t blksize)
  * as well as less than (client msize - P9_IOHDRSZ)
  */
 if (blksize) {
-iounit = blksize;
-iounit *= (s->msize - P9_IOHDRSZ) / blksize;
+iounit = QEMU_ALIGN_DOWN(s->msize - P9_IOHDRSZ, blksize);
 }
 if (!iounit) {
 iounit = s->msize - P9_IOHDRSZ;
-- 
2.20.1

[PULL 1/8] 9pfs: fix wrong I/O block size in Rgetattr

2021-10-27 Thread Christian Schoenebeck

When client sent a 9p Tgetattr request then the wrong I/O block
size value was returned by 9p server; instead of host file
system's I/O block size it should rather return an I/O block
size according to 9p session's 'msize' value, because the value
returned to client should be an "optimum" block size for I/O
(i.e. to maximize performance), it should not reflect the actual
physical block size of the underlying storage media.

The I/O block size of a host filesystem is typically 4k, so the
value returned was far too low for good 9p I/O performance.

This patch adds stat_to_iounit() with a similar approach as the
existing get_iounit() function.

Signed-off-by: Christian Schoenebeck 
Reviewed-by: Greg Kurz 
Message-Id: 
---
 hw/9pfs/9p.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/hw/9pfs/9p.c b/hw/9pfs/9p.c
index c857b31321..708b030474 100644
--- a/hw/9pfs/9p.c
+++ b/hw/9pfs/9p.c
@@ -1262,6 +1262,25 @@ static int coroutine_fn stat_to_v9stat(V9fsPDU *pdu, 
V9fsPath *path,
 #define P9_STATS_ALL   0x3fffULL /* Mask for All fields above */
 
 
+static int32_t stat_to_iounit(const V9fsPDU *pdu, const struct stat *stbuf)
+{
+int32_t iounit = 0;
+V9fsState *s = pdu->s;
+
+/*
+ * iounit should be multiples of st_blksize (host filesystem block size)
+ * as well as less than (client msize - P9_IOHDRSZ)
+ */
+if (stbuf->st_blksize) {
+iounit = stbuf->st_blksize;
+iounit *= (s->msize - P9_IOHDRSZ) / stbuf->st_blksize;
+}
+if (!iounit) {
+iounit = s->msize - P9_IOHDRSZ;
+}
+return iounit;
+}
+
 static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct stat *stbuf,
 V9fsStatDotl *v9lstat)
 {
@@ -1273,7 +1292,7 @@ static int stat_to_v9stat_dotl(V9fsPDU *pdu, const struct 
stat *stbuf,
 v9lstat->st_gid = stbuf->st_gid;
 v9lstat->st_rdev = stbuf->st_rdev;
 v9lstat->st_size = stbuf->st_size;
-v9lstat->st_blksize = stbuf->st_blksize;
+v9lstat->st_blksize = stat_to_iounit(pdu, stbuf);
 v9lstat->st_blocks = stbuf->st_blocks;
 v9lstat->st_atime_sec = stbuf->st_atime;
 v9lstat->st_atime_nsec = stbuf->st_atim.tv_nsec;
-- 
2.20.1

[PULL 4/8] 9pfs: introduce P9Array

2021-10-27 Thread Christian Schoenebeck

Implements deep auto free of arrays while retaining common C-style
squared bracket access. Main purpose of this API is to get rid of
error prone individual array deallocation pathes in user code, i.e.
turning something like this:

  void doSomething(size_t n) {
  Foo *foos = malloc(n * sizeof(Foo));
  for (...) {
  foos[i].s = malloc(...);
  if (...) {
  goto out;
  }
  }
  out:
  if (...) {
  for (...) {
  /* deep deallocation */
  free(foos[i].s);
  }
  /* array deallocation */
  free(foos);
  }
  }

into something more simple and safer like:

  void doSomething(size_t n) {
  P9ARRAY_REF(Foo) foos = NULL;
  P9ARRAY_NEW(Foo, foos, n);
  for (...) {
  foos[i].s = malloc(...);
  if (...) {
  return; /* array auto freed here */
  }
  }
  /* array auto freed here */
  }

Unlike GArray, P9Array does not require special macros, function
calls or struct member dereferencing to access the individual array
elements:

  C-array = P9Array:   vs.  GArray:

  for (...) {   |   for (...) {
  ... = arr[i].m;   |   ... = g_array_index(arr, Foo, i).m;
  arr[i].m = ... ;  |   g_array_index(arr, Foo, i).m = ... ;
  } |   }

So existing C-style array code can be retained with only very little
changes; basically limited to replacing array allocation call and of
course removing individual array deallocation pathes.

In this initial version P9Array only supports the concept of unique
pointers, i.e. it does not support reference counting. The array (and
all dynamically allocated memory of individual array elements) is auto
freed once execution leaves the scope of the reference variable (unique
pointer) associated with the array.

Internally a flex array struct is used in combination with macros
spanned over a continuous memory space for both the array's meta data
(private) and the actual C-array user data (public):

  struct P9Array##scalar_type {
size_t len;/* private, hidden from user code */
scalar_type first[];   /* public, directly exposed to user code */
  };

Which has the advantage that the compiler automatically takes care
about correct padding, alignment and overall size for all scalar data
types on all systems and that the user space exposed pointer can
directly be translated back and forth between user space C-array
pointer and internal P9Array struct whenever needed, in a type-safe
manner.

This header file is released under MIT license, to allow this file
being used in other C-projects as well. The common QEMU license
GPL2+ might have construed a conflict for other projects.

Signed-off-by: Christian Schoenebeck 
Message-Id: 

---
 fsdev/p9array.h | 154 
 1 file changed, 154 insertions(+)
 create mode 100644 fsdev/p9array.h

diff --git a/fsdev/p9array.h b/fsdev/p9array.h
new file mode 100644
index 00..fff946a3d7
--- /dev/null
+++ b/fsdev/p9array.h
@@ -0,0 +1,154 @@
+/*
+ * P9Array - deep auto free C-array
+ *
+ * Copyright (c) 2021 Crudebyte
+ *
+ * Authors:
+ *   Christian Schoenebeck 
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a copy
+ * of this software and associated documentation files (the "Software"), to 
deal
+ * in the Software without restriction, including without limitation the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
+ * THE SOFTWARE.
+ */
+#ifndef QEMU_P9ARRAY_H
+#define QEMU_P9ARRAY_H
+
+/**
+ * P9Array provides a mechanism to access arrays in common C-style (e.g. by
+ * square bracket [] operator) in conjunction with reference variables that
+ * perform deep auto free of the array when leaving the scope of the auto
+ * reference variable. That means not only is the array itself automatically
+ * freed, but also memory dynamically allocated by the individual array
+ * elements.
+ *
+ * Example:
+ *
+ * Consider the following user struct @c Foo which shall be used as scalar
+ * (element) type of an array:
+ * @code
+ * typedef struct Foo {
+ * int i;
+ * char *s;
+ * } Foo;
+ * @endcode
+ * and ass

Re: [PATCH 00/16] fdt: Make OF_BOARD a boolean option

2021-10-27 Thread Ilias Apalodimas

Hi trying to reply to all at the same time!

On Wed, Oct 27, 2021 at 09:38:40AM -0400, Tom Rini wrote:
> On Wed, Oct 27, 2021 at 03:30:18PM +0200, François Ozog wrote:
> > Hi Tom,
> > 
> > On Wed, 27 Oct 2021 at 14:59, Tom Rini  wrote:
> > 
> > > On Tue, Oct 26, 2021 at 09:46:38AM +0300, Ilias Apalodimas wrote:
> > > > Hi Simon,
> > > >
> > > > A bit late to the party, sorry!
> > > >
> > > > [...]
> > > >
> > > > > >
> > > > > > I really want to see what the binary case looks like since we could
> > > then
> > > > > > kill off rpi_{3,3_b,4}_defconfig and I would need to see if we could
> > > > > > then also do a rpi_arm32_defconfig too.
> > > > > >
> > > > > > I want to see less device trees in U-Boot sources, if they can come
> > > > > > functionally correct from the hardware/our caller.
> > > > > >
> > > > > > And I'm not seeing how we make use of "U-Boot /config" if we also
> > > don't
> > > > > > use the device tree from build time at run time, ignoring the device
> > > > > > tree provided to us at run time by the caller.
> > > > >
> > > > > Firstly I should say that I find building firmware very messy and
> > > > > confusing these days. Lots of things to build and it's hard to find
> > > > > the instructions. It doesn't have to be that way, but if we carry on
> > > > > as we are, it will continue to be messy and in five years you will
> > > > > need a Ph.D and a lucky charm to boot on any modern board. My
> > > > > objective here is to simplify things, bringing some consistency to the
> > > > > different components. Binman was one effort there. I feel that putting
> > > > > at least the U-Boot house in order, in my role as devicetree
> > > > > maintainer (and as author of devicetree support in U-Boot back in
> > > > > 2011), is the next step.
> > > > >
> > > > > If we set things up correctly and agree on the bindings, devicetree
> > > > > can be the unifying configuration mechanism through the whole of
> > > > > firmware (except for very early bits) and into the OS, this will set
> > > > > us up very well to deal with the complexity that is coming.
> > > > >
> > > > > Anyway, here are the mental steps that I've gone through over the past
> > > > > two months:
> > > > >
> > > > > Step 1: At present, some people think U-Boot is not even allowed to
> > > > > have its own nodes/properties in the DT. It is an abuse of the
> > > > > devicetree standard, like the /chosen node but with less history. We
> > > > > should sacrifice efficiency, expedience and expandability on the altar
> > > > > of 'devicetree is a hardware description'. How do we get over that
> > > > > one? Wel, I just think we need to accept that U-Boot uses devicetree
> > > > > for its own purposes, as well as for booting the OS. I am not saying
> > > > > it always has to have those properties, but with existing features
> > > > > like verified boot, SPL as well as complex firmware images where
> > > > > U-Boot needs to be able to find things in the image, it is essential.
> > > > > So let's just assume that we need this everywhere, since we certainly
> > > > > need it in at least some places.
> > > > >
> > > > > (stop reading here if you disagree, because nothing below will make
> > > > > any sense...you can still use U-Boot v2011.06 which doesn't have
> > > > > OF_CONTROL :-)
> > > >
> > > > Having U-Boot keep it's *internal* config state in DTs is fine.  Adding
> > > > that to the DTs that are copied over from linux isn't imho.  There are
> > > > various reasons for that.  First of all syncing device trees is a huge
> > > pain
> > > > and that's probably one of the main reasons our DTs are out of sync for 
> > > > a
> > > > large number of boards.
> > >
> > > This re-sync is only a pain because:
> > > 1. Some platforms have been modifying the core dts files LIKE THEY ARE
> > >NOT SUPPOSED TO.
> > > 2. DTS files are getting closer to being the super stable API that has
> > >been promised now that there's validation tools.

Agree on both, but still this is the reality we have to deal with right now

> > >
> > > Some SoCs, like stm32 are doing an amazing job and keeping things in
> > > sync, every release.  Others like NXP are violating rule #1.
> > 
> > With NXP commitment to SystemReady on some IMX8 boards, I think this is
> > changing,
> > at least for the SystemReady boards.
> 
> I'd really like to see some progress (as would the other non-NXP folks
> working on NXP SoCs) in that regard.
> 
> > > Still
> > > others like some TI platforms get bit by #2 (I solved one of these, and
> > > need to cycle back to the one you and I talked about on IRC a while
> > > back, I bet it's another node name dash changed to underbar).
> > >
> > > > The point is this was fine in 2011 were we had SPL only,  but the 
> > > > reality
> > > > today is completely different.  There's previous stage boot loaders (and
> > > > enough cases were vendors prefer those over SPL).  If that bootloader
> > > needs
> > > > to use it's own device tree for whatever r

1 2 3 4 5 >

1 - 100 of 430 matches

Mail list logo