Re: [PATCH v14 08/11] qapi/s390/cpu topology: change-topology monitor command

2023-01-12 Thread Thomas Huth

On 11/01/2023 11.09, Thomas Huth wrote:

On 05/01/2023 15.53, Pierre Morel wrote:

The modification of the CPU attributes are done through a monitor
commands.


s/commands/command/


It allows to move the core inside the topology tree to optimise
the cache usage in the case the host's hypervizor previously


s/hypervizor/hypervisor/


moved the CPU.

The same command allows to modifiy the CPU attributes modifiers


s/modifiy/modify/


like polarization entitlement and the dedicated attribute to notify
the guest if the host admin modified scheduling or dedication of a vCPU.

With this knowledge the guest has the possibility to optimize the
usage of the vCPUs.


Hmm, who is supposed to call this QMP command in the future? Will there be a 
new daemon monitoring the CPU changes in the host? Or will there be a 
libvirt addition for this? ... Seems like I still miss the big picture here...


Or if this is just about to provide an API for future experiments, I'd 
rather suggest to introduce the new commands with a "x-" prefix to mark them 
as experimental (so we would also not need to go through the deprecation 
process in case they don't work out as expected in the future).


 Thomas




Re: [RFC v4 2/3] memory: add depth assert in address_space_to_flatview

2023-01-12 Thread Chuang Xu

Hi, Peter, Paolo,

On 2023/1/10 下午10:45, Peter Xu wrote:

On Tue, Jan 10, 2023 at 12:09:41AM -0800, Chuang Xu wrote:

Hi, Peter and Paolo,

Hi, Chuang, Paolo,


I'm sorry I didn't reply to your email in time. I was infected with
COVID-19 two weeks ago, so I couldn't think about the problems discussed
in your email for a long time.😷

On 2023/1/4 上午1:43, Peter Xu wrote:

Hi, Paolo,

On Wed, Dec 28, 2022 at 09:27:50AM +0100, Paolo Bonzini wrote:

Il ven 23 dic 2022, 16:54 Peter Xu ha scritto:


This is not valid because the transaction could happen in *another*

thread.

In that case memory_region_transaction_depth() will be > 0, but RCU is
needed.

Do you mean the code is wrong, or the comment? Note that the code has
checked rcu_read_locked() where introduced in patch 1, but maybe

something

else was missed?


The assertion is wrong. It will succeed even if RCU is unlocked in this
thread but a transaction is in progress in another thread.

IIUC this is the case where the context:

(1) doesn't have RCU read lock held, and,
(2) doesn't have BQL held.

Is it safe at all to reference any flatview in such a context? The thing
is I think the flatview pointer can be freed anytime if both locks are

not

taken.


Perhaps you can check (memory_region_transaction_depth() > 0 &&
!qemu_mutex_iothread_locked()) || rcu_read_locked() instead?

What if one thread calls address_space_to_flatview() with BQL held but

not

RCU read lock held? I assume it's a legal operation, but it seems to be
able to trigger the assert already?

Thanks,


I'm not sure whether I understand the content of your discussion correctly,
so here I want to explain my understanding firstly.

 From my perspective, Paolo thinks that when thread 1 is in a transaction,
thread 2 will trigger the assertion when accessing the flatview without
holding RCU read lock, although sometimes the thread 2's access to flatview
is legal. So Paolo suggests checking (memory_region_transaction_depth() > 0
&& !qemu_mutex_iothread_locked()) || rcu_read_locked() instead.

And Peter thinks that as long as no thread holds the BQL or RCU read lock,
the old flatview can be released (actually executed by the rcu thread with
BQL held). When thread 1 is in a transaction, if thread 2 access the
flatview
with BQL held but not RCU read lock held, it's a legal operation. In this
legal case, it seems that both my code and Paolo's code will trigger
assertion.

IIUC your original patch is fine in this case (BQL held, RCU not held), as
long as depth==0.  IMHO what we want to trap here is when BQL held (while
RCU is not) and depth>0 which can cause unpredictable side effect of using
obsolete flatview.


I don't quite understand the side effects of depth>0 when BQL is held (while
RCU is not).

In my perspective, both BQL and RCU can ensure that the flatview will not be
released when the worker thread accesses the flatview, because before the rcu
thread releases the flatview, it will make sure itself holding BQL and the
worker thread not holding RCU. So, whether the depth is 0 or not, as long as
BQL or RCU is held, the worker thread will not access the obsolete flatview
(IIUC 'obsolete' means that flatview is released).



To summarize, the original check didn't consider BQL, and if to consider
BQL I think it should be something like:

   /* Guarantees valid access to the flatview, either lock works */
   assert(BQL_HELD() || RCU_HELD());

   /*
* Guarantees any BQL holder is not reading obsolete flatview (e.g. when
* during vm load)
*/
   if (BQL_HELD())
   assert(depth==0);

IIUC it can be merged into:

   assert((BQL_HELD() && depth==0) || RCU_HELD());


IMHO assert(BQL_HELD() || RCU_HELD()) is enough..

Or you think that once a mr transaction is in progress, the old flatview has
been obsolete? If we regard flatview as obsolete when a mr transaction is in
progress, How can RCU ensure that flatview is not obsolete?

What does Paolo think of this check?

Thanks!


I'm not sure if I have a good understanding of your emails? I think
checking(memory_region_transaction_get_depth() == 0 || rcu_read_locked() ||
qemu_mutex_iothread_locked()) should cover the case you discussed.

This seems still problematic too?  Since the assert can pass even if
neither BQL nor RCU is held (as long as depth==0).

Thanks,





Re: [RFC] Notify IRQ sources of level interrupt ack/EOI

2023-01-12 Thread David Woodhouse
On Wed, 2023-01-11 at 12:43 -0700, Alex Williamson wrote:
> On Wed, 11 Jan 2023 19:08:44 +
> David Woodhouse  wrote:
> 
> > On Wed, 2023-01-11 at 11:29 -0700, Alex Williamson wrote:
> > > 
> > > Nice.  IIRC, we ended up with the hack solution we have today in vfio
> > > because there was too much resistance to callbacks that were only
> > > necessary for vfio in the past.  Once we had KVM resampling support,
> > > it simply wasn't worth the effort for a higher latency solution to
> > > fight that battle, so we implemented what could best be described as
> > > a universal workaround embedded in vfio.
> > > 
> > > Clearly a callback is preferable, and yes, that's how we operate with
> > > KVM resampling and unmasking INTx, so in theory plumbing this to our
> > > existing eoi callback and removing the region toggling ought to do
> > > the right thing.  Thanks,  
> > 
> > Well, I'm happy for the Xen support be a second use case which means
> > it's no longer "only necessary for VFIO", and keep prodding at it if
> > that's going to be useful...
> 
> Welcome aboard.  I take it from your cover letter than non-x86
> architectures would be on your todo list.  Ideally the ack callback
> would simply be a requirement for any implementation of a new interrupt
> controller, but that's where we get into striking a balance of device
> assignment imposing requirements on arbitrary architectures that may or
> may not care, or even support, device assignment.
> 
> This is the... dare I say, elegance of the region access hack.  It's
> obviously not pretty or performant, but it universally provides an
> approximation of the behavior of an emulated device, ie. some form of
> guest access to the device is required to de-assert the interrupt.
> 
> We probably need some way to detect the interrupt controller support
> for the callback mechanism so we can generate an appropriate user
> warning to encourage development of that support and fall back to our
> current hack for some degree of functionality.  Thanks,

The other thing I'd like to do is figure out the semantics of the
callback. Right now I've gone for "return true to clear the level" and
that makes me cringe a little bit because in the VFIO case, the
callback will re-enable the IRQ in the kernel *before* returning true
and the irqchip actually clearing its s->irr.

Which could *theoretically* race with the next interrupt happening and
setting the IRR before it's cleared. Although I don't think that
actually happens in practice in qemu because the eventfd would be
processed in the same thread? But I'm not sure I like it; it feels
wrong.

One option is for the generic processing to go "if there *is* a
callback, zero the IRR first and then call the callback". Which fixes
the above race, but you do rely on the drivers to *actually* reassert
it. Which might be OK because it only has to propagate up one level up
the IRQ link chain at a time — e.g. to the PCI INTx code, which would
then process the given INT[ABCD] line and call back those drivers which
*have* a callback to resample *their* state, then recalculate the
overall level based on the corresponding irq_count.

I also wasn't sure if I want to allow calling qemu_set_irq() from the
callback itself, instead of having it return a boolean. That might let
the *callback* worry about when to clear/set the level. It doesn't work
well with shared interrupts in some cases, because the race still
exists if callback B zeroes the IRQ level after callback A already did
so *and* reasserted it. But I think shared interrupts are hosed anyway
if e.g. HPET links to the same GSI that a PCI INTx is routed to; they
overwrite each other's state instead of it being a logical OR of the
two? To share interrupts we need *explicit* muxing like the PCI code
has with its irq_count handling.

This last option has the advantage that it maps directly to the
existing VFIO EOI callback, e.g. vfio_intx_eoi(). It's probably where
I'll start.



smime.p7s
Description: S/MIME cryptographic signature


[PATCH] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Thomas Huth
'-drive if=none' is meant for configuring back-end devices only, so this
got marked as deprecated in QEMU 6.2. Users should now only use the new
way with '-drive if=pflash' instead.

Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst   | 6 --
 docs/about/removed-features.rst | 7 +++
 hw/misc/sifive_u_otp.c  | 7 ---
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 68d29642d7..bfe8148490 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -87,12 +87,6 @@ as short-form boolean values, and passed to plugins as 
``arg_name=on``.
 However, short-form booleans are deprecated and full explicit ``arg_name=on``
 form is preferred.
 
-``-drive if=none`` for the sifive_u OTP device (since 6.2)
-''
-
-Using ``-drive if=none`` to configure the OTP device of the sifive_u
-RISC-V machine is deprecated. Use ``-drive if=pflash`` instead.
-
 ``-no-hpet`` (since 8.0)
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c918cabd1a..b1cb15f3d9 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -422,6 +422,13 @@ the value is hexadecimal.  That is, '0x20M' should be 
written either as
 ``tty`` and ``parport`` used to be aliases for ``serial`` and ``parallel``
 respectively. The actual backend names should be used instead.
 
+``-drive if=none`` for the sifive_u OTP device (removed in 8.0)
+'''
+
+Using ``-drive if=none`` to configure the OTP device of the sifive_u
+RISC-V machine is deprecated. Use ``-drive if=pflash`` instead.
+
+
 QEMU Machine Protocol (QMP) commands
 
 
diff --git a/hw/misc/sifive_u_otp.c b/hw/misc/sifive_u_otp.c
index 6d7fdb040a..8965f5c22a 100644
--- a/hw/misc/sifive_u_otp.c
+++ b/hw/misc/sifive_u_otp.c
@@ -210,13 +210,6 @@ static void sifive_u_otp_realize(DeviceState *dev, Error 
**errp)
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->mmio);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
-if (!dinfo) {
-dinfo = drive_get(IF_NONE, 0, 0);
-if (dinfo) {
-warn_report("using \"-drive if=none\" for the OTP is deprecated, "
-"use \"-drive if=pflash\" instead.");
-}
-}
 if (dinfo) {
 int ret;
 uint64_t perm;
-- 
2.31.1




Re: [PATCH] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Thomas Huth

On 12/01/2023 09.29, Thomas Huth wrote:

'-drive if=none' is meant for configuring back-end devices only, so this
got marked as deprecated in QEMU 6.2. Users should now only use the new
way with '-drive if=pflash' instead.

Signed-off-by: Thomas Huth 
---
  docs/about/deprecated.rst   | 6 --
  docs/about/removed-features.rst | 7 +++
  hw/misc/sifive_u_otp.c  | 7 ---
  3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 68d29642d7..bfe8148490 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -87,12 +87,6 @@ as short-form boolean values, and passed to plugins as 
``arg_name=on``.
  However, short-form booleans are deprecated and full explicit ``arg_name=on``
  form is preferred.
  
-``-drive if=none`` for the sifive_u OTP device (since 6.2)

-''
-
-Using ``-drive if=none`` to configure the OTP device of the sifive_u
-RISC-V machine is deprecated. Use ``-drive if=pflash`` instead.
-
  ``-no-hpet`` (since 8.0)
  
  
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst

index c918cabd1a..b1cb15f3d9 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -422,6 +422,13 @@ the value is hexadecimal.  That is, '0x20M' should be 
written either as
  ``tty`` and ``parport`` used to be aliases for ``serial`` and ``parallel``
  respectively. The actual backend names should be used instead.
  
+``-drive if=none`` for the sifive_u OTP device (removed in 8.0)

+'''
+
+Using ``-drive if=none`` to configure the OTP device of the sifive_u
+RISC-V machine is deprecated. Use ``-drive if=pflash`` instead.


-ENOTENOUGHCOFFEEYET

I think I should adjust that description a little bit instead of blindly 
copy-n-pasting it... Sorry. I'll send a v2.


 Thomas





[PATCH v2] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Thomas Huth
'-drive if=none' is meant for configuring back-end devices only, so this
got marked as deprecated in QEMU 6.2. Users should now only use the new
way with '-drive if=pflash' instead.

Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst   | 6 --
 docs/about/removed-features.rst | 7 +++
 hw/misc/sifive_u_otp.c  | 7 ---
 3 files changed, 7 insertions(+), 13 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 68d29642d7..bfe8148490 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -87,12 +87,6 @@ as short-form boolean values, and passed to plugins as 
``arg_name=on``.
 However, short-form booleans are deprecated and full explicit ``arg_name=on``
 form is preferred.
 
-``-drive if=none`` for the sifive_u OTP device (since 6.2)
-''
-
-Using ``-drive if=none`` to configure the OTP device of the sifive_u
-RISC-V machine is deprecated. Use ``-drive if=pflash`` instead.
-
 ``-no-hpet`` (since 8.0)
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index c918cabd1a..6bd0a2b4e4 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -422,6 +422,13 @@ the value is hexadecimal.  That is, '0x20M' should be 
written either as
 ``tty`` and ``parport`` used to be aliases for ``serial`` and ``parallel``
 respectively. The actual backend names should be used instead.
 
+``-drive if=none`` for the sifive_u OTP device (removed in 8.0)
+'''
+
+Use ``-drive if=pflash`` to configure the OTP device of the sifive_u
+RISC-V machine instead.
+
+
 QEMU Machine Protocol (QMP) commands
 
 
diff --git a/hw/misc/sifive_u_otp.c b/hw/misc/sifive_u_otp.c
index 6d7fdb040a..8965f5c22a 100644
--- a/hw/misc/sifive_u_otp.c
+++ b/hw/misc/sifive_u_otp.c
@@ -210,13 +210,6 @@ static void sifive_u_otp_realize(DeviceState *dev, Error 
**errp)
 sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->mmio);
 
 dinfo = drive_get(IF_PFLASH, 0, 0);
-if (!dinfo) {
-dinfo = drive_get(IF_NONE, 0, 0);
-if (dinfo) {
-warn_report("using \"-drive if=none\" for the OTP is deprecated, "
-"use \"-drive if=pflash\" instead.");
-}
-}
 if (dinfo) {
 int ret;
 uint64_t perm;
-- 
2.31.1




[PATCH v6 00/13] vfio/migration: Implement VFIO migration protocol v2

2023-01-12 Thread Avihai Horon
Hello,

Here is v6 of the series.
Thanks for reviewing!

Following VFIO migration protocol v2 acceptance in kernel, this series
implements VFIO migration according to the new v2 protocol and replaces
the now deprecated v1 implementation.

The main differences between v1 and v2 migration protocols are:
1. VFIO device state is represented as a finite state machine instead of
   a bitmap.

2. The migration interface with kernel is done using VFIO_DEVICE_FEATURE
   ioctl and normal read() and write() instead of the migration region
   used in v1.

3. Pre-copy is made optional in v2 protocol. Support for pre-copy will
   be added later on.

Full description of the v2 protocol and the differences from v1 can be
found here [1].



Patch list:

Patch 1 updates linux headers so we will have the MIG_DATA_SIZE ioctl.

Patches 2-8 are prep patches fixing bugs, adding QEMUFile function
that will be used later and refactoring v1 protocol code to make it
easier to add v2 protocol.

Patches 9-13 implement v2 protocol and remove v1 protocol.

Thanks.



Changes from v5 [2]:
- Dropped patch #3.
- Simplified patch #5 as per Alex's suggestion.
- Changed qemu_file_get_to_fd() to return -EIO instead of -1, as
  suggested by Cedric.
  Also changed it so now write returns -errno instead of -1 on error.
- Fixed compilation error reported by Cedric.
- Changed vfio_migration_query_flags() to print error message and return
  -errno in error case as suggested by Cedric.
- Added Reviewed-by tags.



Changes from v4 [3]:
- Rebased on latest master branch.
- Added linux header update to kernel v6.2-rc1.
- Merged preview patches (#13-14) into this series.



Changes from v3 [4]:
- Rebased on latest master branch.

- Dropped patch #1 "migration: Remove res_compatible parameter" as
  it's not mandatory to this series and needs some further discussion.

- Dropped patch #3 "migration: Block migration comment or code is
  wrong" as it has been merged already.

- Addressed overlooked corner case reported by Vladimir in patch #4
  "migration: Simplify migration_iteration_run()".

- Dropped patch #5 "vfio/migration: Fix wrong enum usage" as it has
  been merged already.

- In patch #12 "vfio/migration: Implement VFIO migration protocol v2":
  1. Changed vfio_save_pending() to update res_precopy_only instead of
 res_postcopy_only (as VFIO migration doesn’t support postcopy).
  2. Moved VFIOMigration->data_buffer allocation to vfio_save_setup()
 and its de-allocation to vfio_save_cleanup(), so now it's
 allocated when actually used (during migration and only on source
 side).

- Addressed Alex's comments:
  1. Eliminated code duplication in patch #7 "vfio/migration: Allow
 migration without VFIO IOMMU dirty tracking support".
  2. Removed redundant initialization of vfio_region_info in patch #10
 "vfio/migration: Move migration v1 logic to vfio_migration_init()".
  3. Added comment about VFIO_MIG_DATA_BUFFER_SIZE heuristic (and
 renamed to VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE).
  4. Cast migration structs to their actual types instead of void *.
  5. Return -errno and -EBADF instead of -1 in vfio_migration_set_state().
  6. Set migration->device_state to new_state even in case of data_fd
 out of sync. Although migration will be aborted, setting device
 state succeeded so we should reflect that.
  7. Renamed VFIO_MIG_PENDING_SIZE to VFIO_MIG_STOP_COPY_SIZE, set it
 to 100G and added a comment about the size choice.
  8. Changed vfio_save_block() to return -errno on error.
  9. Squashed Patch #14 to patch #12.
  10. Adjusted migration data buffer size according to MIG_DATA_SIZE
  ioctl.

- In preview patch #17 "vfio/migration: Query device data size in
  vfio_save_pending()" - changed vfio_save_pending() to report
  VFIO_MIG_STOP_COPY_SIZE on any error.
   
- Added another preview patch "vfio/migration: Optimize
  vfio_save_pending()".

- Added ret value on some traces as suggested by David.

- Added Reviewed-By tags.



Changes from v2 [5]:
- Rebased on top of latest master branch.

- Added relevant patches from Juan's RFC [6] with minor changes:
  1. Added Reviewed-by tag to patch #3 in the RFC.
  2. Adjusted patch #6 to work without patch #4 in the RFC.

- Added a new patch "vfio/migration: Fix wrong enum usage" that fixes a
  small bug in v1 code. This patch has been sent a few weeks ago [7] but
  wasn't taken yet.

- Patch #2 (vfio/migration: Skip pre-copy if dirty page tracking is not
  supported):
  1. Dropped this patch and replaced it with
 "vfio/migration: Allow migration without VFIO IOMMU dirty tracking
 support".
 The new patch uses a different approach – instead of skipping
 pre-copy phase completely, QEMU VFIO code will mark RAM dirty
 (instead of kernel). This ensures that current migration behavior
 is not changed and SLA is taken into account.

- Patch #4 (vfio/common: Change vfio_devices_all_running_and_saving()
  logic to equivalent one):
  1. Improved commit message

[PATCH v6 01/13] linux-headers: Update to v6.2-rc1

2023-01-12 Thread Avihai Horon
Update to commit 1b929c02afd3 ("Linux 6.2-rc1").

Signed-off-by: Avihai Horon 
---
 include/standard-headers/drm/drm_fourcc.h |  63 +++-
 include/standard-headers/linux/ethtool.h  |  81 -
 include/standard-headers/linux/fuse.h |  20 +-
 .../linux/input-event-codes.h |   4 +
 include/standard-headers/linux/pci_regs.h |   2 +
 include/standard-headers/linux/virtio_blk.h   |  19 ++
 include/standard-headers/linux/virtio_bt.h|   8 +
 include/standard-headers/linux/virtio_net.h   |   4 +
 linux-headers/asm-arm64/kvm.h |   1 +
 linux-headers/asm-generic/hugetlb_encode.h|  26 +-
 linux-headers/asm-generic/mman-common.h   |   2 +
 linux-headers/asm-mips/mman.h |   2 +
 linux-headers/asm-riscv/kvm.h |   7 +
 linux-headers/asm-x86/kvm.h   |  11 +-
 linux-headers/linux/kvm.h |  32 +-
 linux-headers/linux/psci.h|  14 +
 linux-headers/linux/userfaultfd.h |   4 +
 linux-headers/linux/vfio.h| 278 +-
 18 files changed, 522 insertions(+), 56 deletions(-)

diff --git a/include/standard-headers/drm/drm_fourcc.h 
b/include/standard-headers/drm/drm_fourcc.h
index 48b620cbef..69cab17b38 100644
--- a/include/standard-headers/drm/drm_fourcc.h
+++ b/include/standard-headers/drm/drm_fourcc.h
@@ -98,18 +98,42 @@ extern "C" {
 #define DRM_FORMAT_INVALID 0
 
 /* color index */
+#define DRM_FORMAT_C1  fourcc_code('C', '1', ' ', ' ') /* [7:0] 
C0:C1:C2:C3:C4:C5:C6:C7 1:1:1:1:1:1:1:1 eight pixels/byte */
+#define DRM_FORMAT_C2  fourcc_code('C', '2', ' ', ' ') /* [7:0] 
C0:C1:C2:C3 2:2:2:2 four pixels/byte */
+#define DRM_FORMAT_C4  fourcc_code('C', '4', ' ', ' ') /* [7:0] C0:C1 
4:4 two pixels/byte */
 #define DRM_FORMAT_C8  fourcc_code('C', '8', ' ', ' ') /* [7:0] C */
 
-/* 8 bpp Red */
+/* 1 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D1  fourcc_code('D', '1', ' ', ' ') /* [7:0] 
D0:D1:D2:D3:D4:D5:D6:D7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D2  fourcc_code('D', '2', ' ', ' ') /* [7:0] 
D0:D1:D2:D3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D4  fourcc_code('D', '4', ' ', ' ') /* [7:0] D0:D1 
4:4 two pixels/byte */
+
+/* 8 bpp Darkness (inverse relationship between channel value and brightness) 
*/
+#define DRM_FORMAT_D8  fourcc_code('D', '8', ' ', ' ') /* [7:0] D */
+
+/* 1 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R1  fourcc_code('R', '1', ' ', ' ') /* [7:0] 
R0:R1:R2:R3:R4:R5:R6:R7 1:1:1:1:1:1:1:1 eight pixels/byte */
+
+/* 2 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R2  fourcc_code('R', '2', ' ', ' ') /* [7:0] 
R0:R1:R2:R3 2:2:2:2 four pixels/byte */
+
+/* 4 bpp Red (direct relationship between channel value and brightness) */
+#define DRM_FORMAT_R4  fourcc_code('R', '4', ' ', ' ') /* [7:0] R0:R1 
4:4 two pixels/byte */
+
+/* 8 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R8  fourcc_code('R', '8', ' ', ' ') /* [7:0] R */
 
-/* 10 bpp Red */
+/* 10 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R10 fourcc_code('R', '1', '0', ' ') /* [15:0] x:R 
6:10 little endian */
 
-/* 12 bpp Red */
+/* 12 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R12 fourcc_code('R', '1', '2', ' ') /* [15:0] x:R 
4:12 little endian */
 
-/* 16 bpp Red */
+/* 16 bpp Red (direct relationship between channel value and brightness) */
 #define DRM_FORMAT_R16 fourcc_code('R', '1', '6', ' ') /* [15:0] R 
little endian */
 
 /* 16 bpp RG */
@@ -204,7 +228,9 @@ extern "C" {
 #define DRM_FORMAT_VYUYfourcc_code('V', 'Y', 'U', 'Y') /* 
[31:0] Y1:Cb0:Y0:Cr0 8:8:8:8 little endian */
 
 #define DRM_FORMAT_AYUVfourcc_code('A', 'Y', 'U', 'V') /* 
[31:0] A:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_AVUYfourcc_code('A', 'V', 'U', 'Y') /* [31:0] 
A:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_XYUVfourcc_code('X', 'Y', 'U', 'V') /* [31:0] 
X:Y:Cb:Cr 8:8:8:8 little endian */
+#define DRM_FORMAT_XVUYfourcc_code('X', 'V', 'U', 'Y') /* [31:0] 
X:Cr:Cb:Y 8:8:8:8 little endian */
 #define DRM_FORMAT_VUY888  fourcc_code('V', 'U', '2', '4') /* [23:0] 
Cr:Cb:Y 8:8:8 little endian */
 #define DRM_FORMAT_VUY101010   fourcc_code('V', 'U', '3', '0') /* Y followed 
by U then V, 10:10:10. Non-linear modifier only */
 
@@ -717,6 +743,35 @@ extern "C" {
  */
 #define DRM_FORMAT_MOD_VIVANTE_SPLIT_SUPER_TILED fourcc_mod_code(VIVANTE, 

[PATCH v6 06/13] vfio/common: Change vfio_devices_all_running_and_saving() logic to equivalent one

2023-01-12 Thread Avihai Horon
vfio_devices_all_running_and_saving() is used to check if migration is
in pre-copy phase. This is done by checking if migration is in setup or
active states and if all VFIO devices are in pre-copy state, i.e.
_SAVING | _RUNNING.

In VFIO migration protocol v2 pre-copy support is made optional. Hence,
a matching v2 protocol pre-copy state can't be used here.

As preparation for adding v2 protocol, change
vfio_devices_all_running_and_saving() logic such that it doesn't use the
VFIO pre-copy state.

The new equivalent logic checks if migration is in active state and if
all VFIO devices are in running state [1]. No functional changes
intended.

[1] Note that checking if migration is in setup or active states and if
all VFIO devices are in running state doesn't guarantee that we are in
pre-copy phase, thus we check if migration is only in active state.

Signed-off-by: Avihai Horon 
---
 hw/vfio/common.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index f6dd571549..3a35f4afad 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -40,6 +40,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/migration.h"
+#include "migration/misc.h"
 #include "sysemu/tpm.h"
 
 VFIOGroupList vfio_group_list =
@@ -363,13 +364,16 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return true;
 }
 
-static bool vfio_devices_all_running_and_saving(VFIOContainer *container)
+/*
+ * Check if all VFIO devices are running and migration is active, which is
+ * essentially equivalent to the migration being in pre-copy phase.
+ */
+static bool vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 {
 VFIOGroup *group;
 VFIODevice *vbasedev;
-MigrationState *ms = migrate_get_current();
 
-if (!migration_is_setup_or_active(ms->state)) {
+if (!migration_is_active(migrate_get_current())) {
 return false;
 }
 
@@ -381,8 +385,7 @@ static bool 
vfio_devices_all_running_and_saving(VFIOContainer *container)
 return false;
 }
 
-if ((migration->device_state & VFIO_DEVICE_STATE_V1_SAVING) &&
-(migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
 continue;
 } else {
 return false;
@@ -461,7 +464,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 };
 
 if (iotlb && container->dirty_pages_supported &&
-vfio_devices_all_running_and_saving(container)) {
+vfio_devices_all_running_and_mig_active(container)) {
 return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
 }
 
@@ -488,7 +491,7 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return -errno;
 }
 
-if (iotlb && vfio_devices_all_running_and_saving(container)) {
+if (iotlb && vfio_devices_all_running_and_mig_active(container)) {
 cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
 tcg_enabled() ? DIRTY_CLIENTS_ALL :
 DIRTY_CLIENTS_NOCODE);
-- 
2.26.3




[PATCH v6 04/13] vfio/migration: Allow migration without VFIO IOMMU dirty tracking support

2023-01-12 Thread Avihai Horon
Currently, if IOMMU of a VFIO container doesn't support dirty page
tracking, migration is blocked. This is because a DMA-able VFIO device
can dirty RAM pages without updating QEMU about it, thus breaking the
migration.

However, this doesn't mean that migration can't be done at all.
In such case, allow migration and let QEMU VFIO code mark all pages
dirty.

This guarantees that all pages that might have gotten dirty are reported
back, and thus guarantees a valid migration even without VFIO IOMMU
dirty tracking support.

The motivation for this patch is the introduction of iommufd [1].
iommufd can directly implement the /dev/vfio/vfio container IOCTLs by
mapping them into its internal ops, allowing the usage of these IOCTLs
over iommufd. However, VFIO IOMMU dirty tracking is not supported by
this VFIO compatibility API.

This patch will allow migration by hosts that use the VFIO compatibility
API and prevent migration regressions caused by the lack of VFIO IOMMU
dirty tracking support.

[1]
https://lore.kernel.org/kvm/0-v6-a196d26f289e+11787-iommufd_...@nvidia.com/

Signed-off-by: Avihai Horon 
---
 hw/vfio/common.c| 20 ++--
 hw/vfio/migration.c |  3 +--
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 130e5d1dc7..f6dd571549 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -488,6 +488,12 @@ static int vfio_dma_unmap(VFIOContainer *container,
 return -errno;
 }
 
+if (iotlb && vfio_devices_all_running_and_saving(container)) {
+cpu_physical_memory_set_dirty_range(iotlb->translated_addr, size,
+tcg_enabled() ? DIRTY_CLIENTS_ALL :
+DIRTY_CLIENTS_NOCODE);
+}
+
 return 0;
 }
 
@@ -1201,6 +1207,10 @@ static void vfio_set_dirty_page_tracking(VFIOContainer 
*container, bool start)
 .argsz = sizeof(dirty),
 };
 
+if (!container->dirty_pages_supported) {
+return;
+}
+
 if (start) {
 dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
 } else {
@@ -1236,6 +1246,13 @@ static int vfio_get_dirty_bitmap(VFIOContainer 
*container, uint64_t iova,
 uint64_t pages;
 int ret;
 
+if (!container->dirty_pages_supported) {
+cpu_physical_memory_set_dirty_range(ram_addr, size,
+tcg_enabled() ? DIRTY_CLIENTS_ALL :
+DIRTY_CLIENTS_NOCODE);
+return 0;
+}
+
 dbitmap = g_malloc0(sizeof(*dbitmap) + sizeof(*range));
 
 dbitmap->argsz = sizeof(*dbitmap) + sizeof(*range);
@@ -1409,8 +1426,7 @@ static void vfio_listener_log_sync(MemoryListener 
*listener,
 {
 VFIOContainer *container = container_of(listener, VFIOContainer, listener);
 
-if (vfio_listener_skipped_section(section) ||
-!container->dirty_pages_supported) {
+if (vfio_listener_skipped_section(section)) {
 return;
 }
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 09fe7c1de2..552c2313b2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -860,11 +860,10 @@ int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-VFIOContainer *container = vbasedev->group->container;
 struct vfio_region_info *info = NULL;
 int ret = -ENOTSUP;
 
-if (!vbasedev->enable_migration || !container->dirty_pages_supported) {
+if (!vbasedev->enable_migration) {
 goto add_blocker;
 }
 
-- 
2.26.3




[PATCH v6 11/13] vfio/migration: Remove VFIO migration protocol v1

2023-01-12 Thread Avihai Horon
Now that v2 protocol implementation has been added, remove the
deprecated v1 implementation.

Signed-off-by: Avihai Horon 
---
 include/hw/vfio/vfio-common.h |   5 -
 hw/vfio/common.c  |  19 +-
 hw/vfio/migration.c   | 703 +-
 hw/vfio/trace-events  |   9 -
 4 files changed, 24 insertions(+), 712 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 113f8d9208..2aba45887c 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -61,18 +61,13 @@ typedef struct VFIORegion {
 typedef struct VFIOMigration {
 struct VFIODevice *vbasedev;
 VMChangeStateEntry *vm_state;
-VFIORegion region;
-uint32_t device_state_v1;
-int vm_running;
 Notifier migration_state;
 NotifierWithReturn migration_data;
-uint64_t pending_bytes;
 enum vfio_device_mig_state device_state;
 int data_fd;
 void *data_buffer;
 size_t data_buffer_size;
 uint64_t stop_copy_size;
-bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index dcaa77d2a8..9a0dbee6b4 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,14 +355,7 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-if (!migration->v2 &&
-(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
-(migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
-return false;
-}
-
-if (migration->v2 &&
-(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
 (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
  migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
 return false;
@@ -393,14 +386,8 @@ static bool 
vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 return false;
 }
 
-if (!migration->v2 &&
-migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
-continue;
-}
-
-if (migration->v2 &&
-(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
- migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+if (migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P) {
 continue;
 } else {
 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 04f4397212..7688c83127 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -142,220 +142,6 @@ static int vfio_migration_set_state(VFIODevice *vbasedev,
 return 0;
 }
 
-static inline int vfio_mig_access(VFIODevice *vbasedev, void *val, int count,
-  off_t off, bool iswrite)
-{
-int ret;
-
-ret = iswrite ? pwrite(vbasedev->fd, val, count, off) :
-pread(vbasedev->fd, val, count, off);
-if (ret < count) {
-error_report("vfio_mig_%s %d byte %s: failed at offset 0x%"
- HWADDR_PRIx", err: %s", iswrite ? "write" : "read", count,
- vbasedev->name, off, strerror(errno));
-return (ret < 0) ? ret : -EINVAL;
-}
-return 0;
-}
-
-static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, size_t count,
-   off_t off, bool iswrite)
-{
-int ret, done = 0;
-__u8 *tbuf = buf;
-
-while (count) {
-int bytes = 0;
-
-if (count >= 8 && !(off % 8)) {
-bytes = 8;
-} else if (count >= 4 && !(off % 4)) {
-bytes = 4;
-} else if (count >= 2 && !(off % 2)) {
-bytes = 2;
-} else {
-bytes = 1;
-}
-
-ret = vfio_mig_access(vbasedev, tbuf, bytes, off, iswrite);
-if (ret) {
-return ret;
-}
-
-count -= bytes;
-done += bytes;
-off += bytes;
-tbuf += bytes;
-}
-return done;
-}
-
-#define vfio_mig_read(f, v, c, o)   vfio_mig_rw(f, (__u8 *)v, c, o, false)
-#define vfio_mig_write(f, v, c, o)  vfio_mig_rw(f, (__u8 *)v, c, o, true)
-
-#define VFIO_MIG_STRUCT_OFFSET(f)   \
- offsetof(struct vfio_device_migration_info, f)
-/*
- * Change the device_state register for device @vbasedev. Bits set in @mask
- * are preserved, bits set in @value are set, and bits not set in either @mask
- * or @value are cleared in device_state. If the register cannot be accessed,
- * the resulting state would be invalid, or the device enters an error state,
- * an error is returned.
- */
-
-static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
-

[PATCH v6 02/13] migration: No save_live_pending() method uses the QEMUFile parameter

2023-01-12 Thread Avihai Horon
From: Juan Quintela 

So remove it everywhere.

Signed-off-by: Juan Quintela 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Dr. David Alan Gilbert 
---
 include/migration/register.h   | 3 +--
 migration/savevm.h | 3 +--
 hw/s390x/s390-stattrib.c   | 2 +-
 hw/vfio/migration.c| 3 +--
 migration/block-dirty-bitmap.c | 3 +--
 migration/block.c  | 2 +-
 migration/migration.c  | 4 ++--
 migration/ram.c| 2 +-
 migration/savevm.c | 7 +++
 9 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/include/migration/register.h b/include/migration/register.h
index c1dcff0f90..eb6266a877 100644
--- a/include/migration/register.h
+++ b/include/migration/register.h
@@ -46,8 +46,7 @@ typedef struct SaveVMHandlers {
 
 /* This runs outside the iothread lock!  */
 int (*save_setup)(QEMUFile *f, void *opaque);
-void (*save_live_pending)(QEMUFile *f, void *opaque,
-  uint64_t threshold_size,
+void (*save_live_pending)(void *opaque, uint64_t threshold_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only);
diff --git a/migration/savevm.h b/migration/savevm.h
index 6461342cb4..6dec468cc3 100644
--- a/migration/savevm.h
+++ b/migration/savevm.h
@@ -40,8 +40,7 @@ void qemu_savevm_state_cleanup(void);
 void qemu_savevm_state_complete_postcopy(QEMUFile *f);
 int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
bool inactivate_disks);
-void qemu_savevm_state_pending(QEMUFile *f, uint64_t max_size,
-   uint64_t *res_precopy_only,
+void qemu_savevm_state_pending(uint64_t max_size, uint64_t *res_precopy_only,
uint64_t *res_compatible,
uint64_t *res_postcopy_only);
 void qemu_savevm_send_ping(QEMUFile *f, uint32_t value);
diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 9eda1c3b2a..a553a1e850 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -182,7 +182,7 @@ static int cmma_save_setup(QEMUFile *f, void *opaque)
 return 0;
 }
 
-static void cmma_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void cmma_save_pending(void *opaque, uint64_t max_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c74453e0b5..e1413ac90c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -456,8 +456,7 @@ static void vfio_save_cleanup(void *opaque)
 trace_vfio_save_cleanup(vbasedev->name);
 }
 
-static void vfio_save_pending(QEMUFile *f, void *opaque,
-  uint64_t threshold_size,
+static void vfio_save_pending(void *opaque, uint64_t threshold_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 283017d7d3..ffc433cd11 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -761,8 +761,7 @@ static int dirty_bitmap_save_complete(QEMUFile *f, void 
*opaque)
 return 0;
 }
 
-static void dirty_bitmap_save_pending(QEMUFile *f, void *opaque,
-  uint64_t max_size,
+static void dirty_bitmap_save_pending(void *opaque, uint64_t max_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
diff --git a/migration/block.c b/migration/block.c
index 4347da1526..b6a98caf78 100644
--- a/migration/block.c
+++ b/migration/block.c
@@ -862,7 +862,7 @@ static int block_save_complete(QEMUFile *f, void *opaque)
 return 0;
 }
 
-static void block_save_pending(QEMUFile *f, void *opaque, uint64_t max_size,
+static void block_save_pending(void *opaque, uint64_t max_size,
uint64_t *res_precopy_only,
uint64_t *res_compatible,
uint64_t *res_postcopy_only)
diff --git a/migration/migration.c b/migration/migration.c
index 52b5d39244..9795d0ec5c 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -3751,8 +3751,8 @@ static MigIterateState 
migration_iteration_run(MigrationState *s)
 uint64_t pending_size, pend_pre, pend_compat, pend_post;
 bool in_postcopy = s->state == MIGRATION_STATUS_POSTCOPY_ACTIVE;
 
-qemu_savevm_state_pending(s->to_dst_file, s->threshold_size, &pend_pre,
-  &pend_compat, &pend_post);
+qemu_savevm

[PATCH v6 13/13] docs/devel: Align VFIO migration docs to v2 protocol

2023-01-12 Thread Avihai Horon
Now that VFIO migration protocol v2 has been implemented and v1 protocol
has been removed, update the documentation according to v2 protocol.

Signed-off-by: Avihai Horon 
---
 docs/devel/vfio-migration.rst | 68 ---
 1 file changed, 30 insertions(+), 38 deletions(-)

diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
index 9ff6163c88..1d50c2fe5f 100644
--- a/docs/devel/vfio-migration.rst
+++ b/docs/devel/vfio-migration.rst
@@ -7,46 +7,39 @@ the guest is running on source host and restoring this saved 
state on the
 destination host. This document details how saving and restoring of VFIO
 devices is done in QEMU.
 
-Migration of VFIO devices consists of two phases: the optional pre-copy phase,
-and the stop-and-copy phase. The pre-copy phase is iterative and allows to
-accommodate VFIO devices that have a large amount of data that needs to be
-transferred. The iterative pre-copy phase of migration allows for the guest to
-continue whilst the VFIO device state is transferred to the destination, this
-helps to reduce the total downtime of the VM. VFIO devices can choose to skip
-the pre-copy phase of migration by returning pending_bytes as zero during the
-pre-copy phase.
+Migration of VFIO devices currently consists of a single stop-and-copy phase.
+During the stop-and-copy phase the guest is stopped and the entire VFIO device
+data is transferred to the destination.
+
+The pre-copy phase of migration is currently not supported for VFIO devices.
+Support for VFIO pre-copy will be added later on.
 
 A detailed description of the UAPI for VFIO device migration can be found in
-the comment for the ``vfio_device_migration_info`` structure in the header
-file linux-headers/linux/vfio.h.
+the comment for the ``vfio_device_mig_state`` structure in the header file
+linux-headers/linux/vfio.h.
 
 VFIO implements the device hooks for the iterative approach as follows:
 
-* A ``save_setup`` function that sets up the migration region and sets _SAVING
-  flag in the VFIO device state.
+* A ``save_setup`` function that sets up migration on the source.
 
-* A ``load_setup`` function that sets up the migration region on the
-  destination and sets _RESUMING flag in the VFIO device state.
+* A ``load_setup`` function that sets the VFIO device on the destination in
+  _RESUMING state.
 
 * A ``save_live_pending`` function that reads pending_bytes from the vendor
   driver, which indicates the amount of data that the vendor driver has yet to
   save for the VFIO device.
 
-* A ``save_live_iterate`` function that reads the VFIO device's data from the
-  vendor driver through the migration region during iterative phase.
-
 * A ``save_state`` function to save the device config space if it is present.
 
-* A ``save_live_complete_precopy`` function that resets _RUNNING flag from the
-  VFIO device state and iteratively copies the remaining data for the VFIO
-  device until the vendor driver indicates that no data remains (pending bytes
-  is zero).
+* A ``save_live_complete_precopy`` function that sets the VFIO device in
+  _STOP_COPY state and iteratively copies the data for the VFIO device until
+  the vendor driver indicates that no data remains.
 
 * A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above
+  sections that are generated by the save functions above.
 
 * ``cleanup`` functions for both save and load that perform any migration
-  related cleanup, including unmapping the migration region
+  related cleanup.
 
 
 The VFIO migration code uses a VM state change handler to change the VFIO
@@ -71,13 +64,13 @@ tracking can identify dirtied pages, but any page pinned by 
the vendor driver
 can also be written by the device. There is currently no device or IOMMU
 support for dirty page tracking in hardware.
 
-By default, dirty pages are tracked when the device is in pre-copy as well as
-stop-and-copy phase. So, a page pinned by the vendor driver will be copied to
-the destination in both phases. Copying dirty pages in pre-copy phase helps
-QEMU to predict if it can achieve its downtime tolerances. If QEMU during
-pre-copy phase keeps finding dirty pages continuously, then it understands
-that even in stop-and-copy phase, it is likely to find dirty pages and can
-predict the downtime accordingly.
+By default, dirty pages are tracked during pre-copy as well as stop-and-copy
+phase. So, a page pinned by the vendor driver will be copied to the destination
+in both phases. Copying dirty pages in pre-copy phase helps QEMU to predict if
+it can achieve its downtime tolerances. If QEMU during pre-copy phase keeps
+finding dirty pages continuously, then it understands that even in 
stop-and-copy
+phase, it is likely to find dirty pages and can predict the downtime
+accordingly.
 
 QEMU also provides a per device opt-out option ``pre-copy-dirty-page-tracking``
 which disables querying the dirty bitmap durin

[PATCH v6 05/13] migration/qemu-file: Add qemu_file_get_to_fd()

2023-01-12 Thread Avihai Horon
Add new function qemu_file_get_to_fd() that allows reading data from
QEMUFile and writing it straight into a given fd.

This will be used later in VFIO migration code.

Signed-off-by: Avihai Horon 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 migration/qemu-file.h |  1 +
 migration/qemu-file.c | 34 ++
 2 files changed, 35 insertions(+)

diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index fa13d04d78..9d0155a2a1 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -148,6 +148,7 @@ int qemu_file_shutdown(QEMUFile *f);
 QEMUFile *qemu_file_get_return_path(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
 void qemu_file_set_blocking(QEMUFile *f, bool block);
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size);
 
 void ram_control_before_iterate(QEMUFile *f, uint64_t flags);
 void ram_control_after_iterate(QEMUFile *f, uint64_t flags);
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 2d5f74ffc2..102ab3b439 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -940,3 +940,37 @@ QIOChannel *qemu_file_get_ioc(QEMUFile *file)
 {
 return file->ioc;
 }
+
+/*
+ * Read size bytes from QEMUFile f and write them to fd.
+ */
+int qemu_file_get_to_fd(QEMUFile *f, int fd, size_t size)
+{
+while (size) {
+size_t pending = f->buf_size - f->buf_index;
+ssize_t rc;
+
+if (!pending) {
+rc = qemu_fill_buffer(f);
+if (rc < 0) {
+return rc;
+}
+if (rc == 0) {
+return -EIO;
+}
+continue;
+}
+
+rc = write(fd, f->buf + f->buf_index, MIN(pending, size));
+if (rc < 0) {
+return -errno;
+}
+if (rc == 0) {
+return -EIO;
+}
+f->buf_index += rc;
+size -= rc;
+}
+
+return 0;
+}
-- 
2.26.3




[PATCH v6 07/13] vfio/migration: Move migration v1 logic to vfio_migration_init()

2023-01-12 Thread Avihai Horon
Move vfio_dev_get_region_info() logic from vfio_migration_probe() to
vfio_migration_init(). This logic is specific to v1 protocol and moving
it will make it easier to add the v2 protocol implementation later.
No functional changes intended.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 hw/vfio/migration.c  | 30 +++---
 hw/vfio/trace-events |  2 +-
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 552c2313b2..977da64411 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -788,14 +788,14 @@ static void vfio_migration_exit(VFIODevice *vbasedev)
 vbasedev->migration = NULL;
 }
 
-static int vfio_migration_init(VFIODevice *vbasedev,
-   struct vfio_region_info *info)
+static int vfio_migration_init(VFIODevice *vbasedev)
 {
 int ret;
 Object *obj;
 VFIOMigration *migration;
 char id[256] = "";
 g_autofree char *path = NULL, *oid = NULL;
+struct vfio_region_info *info;
 
 if (!vbasedev->ops->vfio_get_object) {
 return -EINVAL;
@@ -806,6 +806,14 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return -EINVAL;
 }
 
+ret = vfio_get_dev_region_info(vbasedev,
+   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
+   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
+   &info);
+if (ret) {
+return ret;
+}
+
 vbasedev->migration = g_new0(VFIOMigration, 1);
 vbasedev->migration->device_state = VFIO_DEVICE_STATE_V1_RUNNING;
 vbasedev->migration->vm_running = runstate_is_running();
@@ -825,6 +833,8 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 goto err;
 }
 
+g_free(info);
+
 migration = vbasedev->migration;
 migration->vbasedev = vbasedev;
 
@@ -847,6 +857,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return 0;
 
 err:
+g_free(info);
 vfio_migration_exit(vbasedev);
 return ret;
 }
@@ -860,34 +871,23 @@ int64_t vfio_mig_bytes_transferred(void)
 
 int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
 {
-struct vfio_region_info *info = NULL;
 int ret = -ENOTSUP;
 
 if (!vbasedev->enable_migration) {
 goto add_blocker;
 }
 
-ret = vfio_get_dev_region_info(vbasedev,
-   VFIO_REGION_TYPE_MIGRATION_DEPRECATED,
-   VFIO_REGION_SUBTYPE_MIGRATION_DEPRECATED,
-   &info);
-if (ret) {
-goto add_blocker;
-}
-
-ret = vfio_migration_init(vbasedev, info);
+ret = vfio_migration_init(vbasedev);
 if (ret) {
 goto add_blocker;
 }
 
-trace_vfio_migration_probe(vbasedev->name, info->index);
-g_free(info);
+trace_vfio_migration_probe(vbasedev->name);
 return 0;
 
 add_blocker:
 error_setg(&vbasedev->migration_blocker,
"VFIO device doesn't support migration");
-g_free(info);
 
 ret = migrate_add_blocker(vbasedev->migration_blocker, errp);
 if (ret < 0) {
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 73dffe9e00..b259dcc644 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,7 +148,7 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) 
"%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
-vfio_migration_probe(const char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(const char *name, int running, const char *reason, 
uint32_t dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
-- 
2.26.3




[PATCH v6 03/13] vfio/migration: Fix NULL pointer dereference bug

2023-01-12 Thread Avihai Horon
As part of its error flow, vfio_vmstate_change() accesses
MigrationState->to_dst_file without any checks. This can cause a NULL
pointer dereference if the error flow is taken and
MigrationState->to_dst_file is not set.

For example, this can happen if VM is started or stopped not during
migration and vfio_vmstate_change() error flow is taken, as
MigrationState->to_dst_file is not set at that time.

Fix it by checking that MigrationState->to_dst_file is set before using
it.

Fixes: 02a7e71b1e5b ("vfio: Add VM state change handler to know state of VM")
Signed-off-by: Avihai Horon 
Reviewed-by: Juan Quintela 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 hw/vfio/migration.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index e1413ac90c..09fe7c1de2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -743,7 +743,9 @@ static void vfio_vmstate_change(void *opaque, bool running, 
RunState state)
  */
 error_report("%s: Failed to set device state 0x%x", vbasedev->name,
  (migration->device_state & mask) | value);
-qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+if (migrate_get_current()->to_dst_file) {
+qemu_file_set_error(migrate_get_current()->to_dst_file, ret);
+}
 }
 vbasedev->migration->vm_running = running;
 trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
-- 
2.26.3




[PATCH v6 10/13] vfio/migration: Optimize vfio_save_pending()

2023-01-12 Thread Avihai Horon
During pre-copy phase of migration vfio_save_pending() is called
repeatedly and queries the VFIO device for its pending data size.

As long as pending RAM size is over the threshold, migration can't
converge and be completed. Therefore, during this time there is no
point in querying the VFIO device pending data size.

Avoid these unnecessary queries by issuing them in a RAM pre-copy
notifier instead of vfio_save_pending().

This way the VFIO device is queried only when RAM pending data is
below the threshold, when there is an actual chance for migration to
converge.

Signed-off-by: Avihai Horon 
---
 include/hw/vfio/vfio-common.h |  2 ++
 hw/vfio/migration.c   | 56 +++
 hw/vfio/trace-events  |  1 +
 3 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 2ec3346fea..113f8d9208 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -65,11 +65,13 @@ typedef struct VFIOMigration {
 uint32_t device_state_v1;
 int vm_running;
 Notifier migration_state;
+NotifierWithReturn migration_data;
 uint64_t pending_bytes;
 enum vfio_device_mig_state device_state;
 int data_fd;
 void *data_buffer;
 size_t data_buffer_size;
+uint64_t stop_copy_size;
 bool v2;
 } VFIOMigration;
 
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 08f53189fa..04f4397212 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -655,29 +655,19 @@ static void vfio_v1_save_cleanup(void *opaque)
 trace_vfio_save_cleanup(vbasedev->name);
 }
 
-/*
- * Migration size of VFIO devices can be as little as a few KBs or as big as
- * many GBs. This value should be big enough to cover the worst case.
- */
-#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
 static void vfio_save_pending(void *opaque, uint64_t threshold_size,
   uint64_t *res_precopy_only,
   uint64_t *res_compatible,
   uint64_t *res_postcopy_only)
 {
 VFIODevice *vbasedev = opaque;
-uint64_t stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+VFIOMigration *migration = vbasedev->migration;
 
-/*
- * If getting pending migration size fails, VFIO_MIG_STOP_COPY_SIZE is
- * reported so downtime limit won't be violated.
- */
-vfio_query_stop_copy_size(vbasedev, &stop_copy_size);
-*res_precopy_only += stop_copy_size;
+*res_precopy_only += migration->stop_copy_size;
 
 trace_vfio_save_pending(vbasedev->name, *res_precopy_only,
 *res_postcopy_only, *res_compatible,
-stop_copy_size);
+migration->stop_copy_size);
 }
 
 static void vfio_v1_save_pending(void *opaque, uint64_t threshold_size,
@@ -1104,6 +1094,40 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 }
 }
 
+/*
+ * Migration size of VFIO devices can be as little as a few KBs or as big as
+ * many GBs. This value should be big enough to cover the worst case.
+ */
+#define VFIO_MIG_STOP_COPY_SIZE (100 * GiB)
+static int vfio_migration_data_notifier(NotifierWithReturn *n, void *data)
+{
+VFIOMigration *migration = container_of(n, VFIOMigration, migration_data);
+VFIODevice *vbasedev = migration->vbasedev;
+PrecopyNotifyData *pnd = data;
+
+if (pnd->reason != PRECOPY_NOTIFY_AFTER_BITMAP_SYNC) {
+return 0;
+}
+
+/* No need to get pending size when finishing migration */
+if (runstate_check(RUN_STATE_FINISH_MIGRATE)) {
+return 0;
+}
+
+if (vfio_query_stop_copy_size(vbasedev, &migration->stop_copy_size)) {
+/*
+ * Failed to get pending migration size. Report big pending size so
+ * downtime limit won't be violated.
+ */
+migration->stop_copy_size = VFIO_MIG_STOP_COPY_SIZE;
+}
+
+trace_vfio_migration_data_notifier(vbasedev->name,
+   migration->stop_copy_size);
+
+return 0;
+}
+
 static void vfio_migration_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -1225,6 +1249,9 @@ static int vfio_migration_init(VFIODevice *vbasedev)
 
 migration->vm_state = qdev_add_vm_change_state_handler(
 vbasedev->dev, vfio_vmstate_change, vbasedev);
+
+migration->migration_data.notify = vfio_migration_data_notifier;
+precopy_add_notifier(&migration->migration_data);
 } else {
 register_savevm_live(id, VMSTATE_INSTANCE_ID_ANY, 1,
  &savevm_vfio_v1_handlers, vbasedev);
@@ -1283,6 +1310,9 @@ void vfio_migration_finalize(VFIODevice *vbasedev)
 if (vbasedev->migration) {
 VFIOMigration *migration = vbasedev->migration;
 
+if (migration->v2) {
+precopy_remove_notifier(&migration->migration_data);
+}
 remove_migration_state_change_notifier(&mi

[PATCH v6 12/13] vfio: Alphabetize migration section of VFIO trace-events file

2023-01-12 Thread Avihai Horon
Sort the migration section of VFIO trace events file alphabetically
and move two misplaced traces to common.c section.

Signed-off-by: Avihai Horon 
---
 hw/vfio/trace-events | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 60c49b2ecf..db9cb94952 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -119,6 +119,8 @@ vfio_region_sparse_mmap_header(const char *name, int index, 
int nr_areas) "Devic
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
 vfio_dma_unmap_overflow_workaround(void) ""
+vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t 
bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 
0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
+vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu 
dirty @ 0x%"PRIx64" - 0x%"PRIx64
 
 # platform.c
 vfio_platform_base_device_init(char *name, int groupid) "%s belongs to group 
#%d"
@@ -148,20 +150,18 @@ vfio_display_edid_update(uint32_t prefx, uint32_t prefy) 
"%ux%u"
 vfio_display_edid_write_error(void) ""
 
 # migration.c
+vfio_load_cleanup(const char *name) " (%s)"
+vfio_load_device_config_state(const char *name) " (%s)"
+vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
+vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " 
(%s) size 0x%"PRIx64" ret %d"
+vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) 
stopcopy size 0x%"PRIx64
 vfio_migration_probe(const char *name) " (%s)"
 vfio_migration_set_state(const char *name, const char *state) " (%s) state %s"
-vfio_vmstate_change(const char *name, int running, const char *reason, const 
char *dev_state) " (%s) running %d reason %s device state %s"
 vfio_migration_state_notifier(const char *name, const char *state) " (%s) 
state %s"
-vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data 
buffer size 0x%"PRIx64
+vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
 vfio_save_cleanup(const char *name) " (%s)"
+vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
 vfio_save_device_config_state(const char *name) " (%s)"
 vfio_save_pending(const char *name, uint64_t precopy, uint64_t postcopy, 
uint64_t compatible, uint64_t stopcopy_size) " (%s) precopy 0x%"PRIx64" 
postcopy 0x%"PRIx64" compatible 0x%"PRIx64" stopcopy size 0x%"PRIx64
-vfio_save_complete_precopy(const char *name, int ret) " (%s) ret %d"
-vfio_load_device_config_state(const char *name) " (%s)"
-vfio_load_state(const char *name, uint64_t data) " (%s) data 0x%"PRIx64
-vfio_load_state_device_data(const char *name, uint64_t data_size, int ret) " 
(%s) size 0x%"PRIx64" ret %d"
-vfio_load_cleanup(const char *name) " (%s)"
-vfio_get_dirty_bitmap(int fd, uint64_t iova, uint64_t size, uint64_t 
bitmap_size, uint64_t start) "container fd=%d, iova=0x%"PRIx64" size= 
0x%"PRIx64" bitmap_size=0x%"PRIx64" start=0x%"PRIx64
-vfio_iommu_map_dirty_notify(uint64_t iova_start, uint64_t iova_end) "iommu 
dirty @ 0x%"PRIx64" - 0x%"PRIx64
-vfio_save_block(const char *name, int data_size) " (%s) data_size %d"
-vfio_migration_data_notifier(const char *name, uint64_t stopcopy_size) " (%s) 
stopcopy size 0x%"PRIx64
+vfio_save_setup(const char *name, uint64_t data_buffer_size) " (%s) data 
buffer size 0x%"PRIx64
+vfio_vmstate_change(const char *name, int running, const char *reason, const 
char *dev_state) " (%s) running %d reason %s device state %s"
-- 
2.26.3




[PATCH v6 08/13] vfio/migration: Rename functions/structs related to v1 protocol

2023-01-12 Thread Avihai Horon
To avoid name collisions, rename functions and structs related to VFIO
migration protocol v1. This will allow the two protocols to co-exist
when v2 protocol is added, until v1 is removed. No functional changes
intended.

Signed-off-by: Avihai Horon 
Reviewed-by: Cédric Le Goater 
---
 include/hw/vfio/vfio-common.h |   2 +-
 hw/vfio/common.c  |   6 +-
 hw/vfio/migration.c   | 106 +-
 hw/vfio/trace-events  |  12 ++--
 4 files changed, 63 insertions(+), 63 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index e573f5a9f1..bbaf72ba00 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -62,7 +62,7 @@ typedef struct VFIOMigration {
 struct VFIODevice *vbasedev;
 VMChangeStateEntry *vm_state;
 VFIORegion region;
-uint32_t device_state;
+uint32_t device_state_v1;
 int vm_running;
 Notifier migration_state;
 uint64_t pending_bytes;
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 3a35f4afad..550b2d7ded 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,8 +355,8 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF)
-&& (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING)) {
+if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+(migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
 return false;
 }
 }
@@ -385,7 +385,7 @@ static bool 
vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 return false;
 }
 
-if (migration->device_state & VFIO_DEVICE_STATE_V1_RUNNING) {
+if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
 continue;
 } else {
 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 977da64411..9df859f4d3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -107,8 +107,8 @@ static int vfio_mig_rw(VFIODevice *vbasedev, __u8 *buf, 
size_t count,
  * an error is returned.
  */
 
-static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
-uint32_t value)
+static int vfio_migration_v1_set_state(VFIODevice *vbasedev, uint32_t mask,
+   uint32_t value)
 {
 VFIOMigration *migration = vbasedev->migration;
 VFIORegion *region = &migration->region;
@@ -145,8 +145,8 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return ret;
 }
 
-migration->device_state = device_state;
-trace_vfio_migration_set_state(vbasedev->name, device_state);
+migration->device_state_v1 = device_state;
+trace_vfio_migration_v1_set_state(vbasedev->name, device_state);
 return 0;
 }
 
@@ -260,8 +260,8 @@ static int vfio_save_buffer(QEMUFile *f, VFIODevice 
*vbasedev, uint64_t *size)
 return ret;
 }
 
-static int vfio_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
-uint64_t data_size)
+static int vfio_v1_load_buffer(QEMUFile *f, VFIODevice *vbasedev,
+   uint64_t data_size)
 {
 VFIORegion *region = &vbasedev->migration->region;
 uint64_t data_offset = 0, size, report_size;
@@ -288,7 +288,7 @@ static int vfio_load_buffer(QEMUFile *f, VFIODevice 
*vbasedev,
 data_size = 0;
 }
 
-trace_vfio_load_state_device_data(vbasedev->name, data_offset, size);
+trace_vfio_v1_load_state_device_data(vbasedev->name, data_offset, 
size);
 
 while (size) {
 void *buf;
@@ -394,7 +394,7 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
-static void vfio_migration_cleanup(VFIODevice *vbasedev)
+static void vfio_migration_v1_cleanup(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
 
@@ -405,13 +405,13 @@ static void vfio_migration_cleanup(VFIODevice *vbasedev)
 
 /* -- */
 
-static int vfio_save_setup(QEMUFile *f, void *opaque)
+static int vfio_v1_save_setup(QEMUFile *f, void *opaque)
 {
 VFIODevice *vbasedev = opaque;
 VFIOMigration *migration = vbasedev->migration;
 int ret;
 
-trace_vfio_save_setup(vbasedev->name);
+trace_vfio_v1_save_setup(vbasedev->name);
 
 qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
 
@@ -431,8 +431,8 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
 }
 }
 
-ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
-   VFIO_DEVICE_STATE_V1_SAVING);
+ret = vfio_migration_v1_set_state(vbasedev, VFIO_DEVICE_STATE_MASK,
+ 

Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL

2023-01-12 Thread Daniel P . Berrangé
On Wed, Jan 11, 2023 at 05:30:18PM -0500, Stefan Berger wrote:
> To prevent getting stuck on waitpid() in case the target process does
> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> process has not changed state until then send a SIGKILL to it.
> 
> Signed-off-by: Stefan Berger 
> ---
>  tests/qtest/libqtest.c | 18 +-
>  1 file changed, 17 insertions(+), 1 deletion(-)

Since this is a test suite and we know our CI system gets very
heavily loaded, I think we should wait more than 10 secs, to
ensure QEMU has time to flush pending I/O in particular which
is most likely to delay things. If you bump the time to 30 secs
then

  Reviewed-by: Daniel P. Berrangé 

> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 2fbc3b88f3..362b1f724f 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>  {
>  #ifndef _WIN32
>  pid_t pid;
> +uint64_t end;
> +
> +/* poll for 10s until sending SIGKILL */
> +end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
> +
> +do {
> +pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
> +if (pid != 0) {
> +break;
> +}
> +g_usleep(100 * 1000);
> +} while (g_get_monotonic_time() < end);
> +
> +if (pid == 0) {
> +kill(s->qemu_pid, SIGKILL);
> +TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
> +}
>  
> -TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
>  assert(pid == s->qemu_pid);
>  #else
>  DWORD ret;
> -- 
> 2.39.0
> 

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH v6 09/13] vfio/migration: Implement VFIO migration protocol v2

2023-01-12 Thread Avihai Horon
Implement the basic mandatory part of VFIO migration protocol v2.
This includes all functionality that is necessary to support
VFIO_MIGRATION_STOP_COPY part of the v2 protocol.

The two protocols, v1 and v2, will co-exist and in the following patches
v1 protocol code will be removed.

There are several main differences between v1 and v2 protocols:
- VFIO device state is now represented as a finite state machine instead
  of a bitmap.

- Migration interface with kernel is now done using VFIO_DEVICE_FEATURE
  ioctl and normal read() and write() instead of the migration region.

- Pre-copy is made optional in v2 protocol. Support for pre-copy will be
  added later on.

Detailed information about VFIO migration protocol v2 and its difference
compared to v1 protocol can be found here [1].

[1]
https://lore.kernel.org/all/20220224142024.147653-10-yish...@nvidia.com/

Signed-off-by: Avihai Horon 
---
 include/hw/vfio/vfio-common.h |   5 +
 hw/vfio/common.c  |  19 +-
 hw/vfio/migration.c   | 455 +++---
 hw/vfio/trace-events  |   7 +
 4 files changed, 447 insertions(+), 39 deletions(-)

diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index bbaf72ba00..2ec3346fea 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -66,6 +66,11 @@ typedef struct VFIOMigration {
 int vm_running;
 Notifier migration_state;
 uint64_t pending_bytes;
+enum vfio_device_mig_state device_state;
+int data_fd;
+void *data_buffer;
+size_t data_buffer_size;
+bool v2;
 } VFIOMigration;
 
 typedef struct VFIOAddressSpace {
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 550b2d7ded..dcaa77d2a8 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -355,10 +355,18 @@ static bool vfio_devices_all_dirty_tracking(VFIOContainer 
*container)
 return false;
 }
 
-if ((vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+if (!migration->v2 &&
+(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
 (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING)) {
 return false;
 }
+
+if (migration->v2 &&
+(vbasedev->pre_copy_dirty_page_tracking == ON_OFF_AUTO_OFF) &&
+(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+ migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
+return false;
+}
 }
 }
 return true;
@@ -385,7 +393,14 @@ static bool 
vfio_devices_all_running_and_mig_active(VFIOContainer *container)
 return false;
 }
 
-if (migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+if (!migration->v2 &&
+migration->device_state_v1 & VFIO_DEVICE_STATE_V1_RUNNING) {
+continue;
+}
+
+if (migration->v2 &&
+(migration->device_state == VFIO_DEVICE_STATE_RUNNING ||
+ migration->device_state == VFIO_DEVICE_STATE_RUNNING_P2P)) {
 continue;
 } else {
 return false;
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 9df859f4d3..08f53189fa 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
 #include "qemu/cutils.h"
+#include "qemu/units.h"
 #include 
 #include 
 
@@ -44,8 +45,103 @@
 #define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
 #define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
 
+/*
+ * This is an arbitrary size based on migration of mlx5 devices, where 
typically
+ * total device migration size is on the order of 100s of MB. Testing with
+ * larger values, e.g. 128MB and 1GB, did not show a performance improvement.
+ */
+#define VFIO_MIG_DEFAULT_DATA_BUFFER_SIZE (1 * MiB)
+
 static int64_t bytes_transferred;
 
+static const char *mig_state_to_str(enum vfio_device_mig_state state)
+{
+switch (state) {
+case VFIO_DEVICE_STATE_ERROR:
+return "ERROR";
+case VFIO_DEVICE_STATE_STOP:
+return "STOP";
+case VFIO_DEVICE_STATE_RUNNING:
+return "RUNNING";
+case VFIO_DEVICE_STATE_STOP_COPY:
+return "STOP_COPY";
+case VFIO_DEVICE_STATE_RESUMING:
+return "RESUMING";
+case VFIO_DEVICE_STATE_RUNNING_P2P:
+return "RUNNING_P2P";
+default:
+return "UNKNOWN STATE";
+}
+}
+
+static int vfio_migration_set_state(VFIODevice *vbasedev,
+enum vfio_device_mig_state new_state,
+enum vfio_device_mig_state recover_state)
+{
+VFIOMigration *migration = vbasedev->migration;
+uint64_t buf[DIV_ROUND_UP(sizeof(struct vfio_device_feature) +
+  sizeof(struct vfio_device_feature_mig_state),
+  

Re: [PATCH] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 09:29, Thomas Huth wrote:

'-drive if=none' is meant for configuring back-end devices only, so this
got marked as deprecated in QEMU 6.2. Users should now only use the new
way with '-drive if=pflash' instead.

Signed-off-by: Thomas Huth 
---
  docs/about/deprecated.rst   | 6 --
  docs/about/removed-features.rst | 7 +++
  hw/misc/sifive_u_otp.c  | 7 ---
  3 files changed, 7 insertions(+), 13 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v2] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 09:39, Thomas Huth wrote:

'-drive if=none' is meant for configuring back-end devices only, so this
got marked as deprecated in QEMU 6.2. Users should now only use the new
way with '-drive if=pflash' instead.

Signed-off-by: Thomas Huth 
---
  docs/about/deprecated.rst   | 6 --
  docs/about/removed-features.rst | 7 +++
  hw/misc/sifive_u_otp.c  | 7 ---
  3 files changed, 7 insertions(+), 13 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:15, Philippe Mathieu-Daudé wrote:

On 12/1/23 09:29, Thomas Huth wrote:

'-drive if=none' is meant for configuring back-end devices only, so this
got marked as deprecated in QEMU 6.2. Users should now only use the new
way with '-drive if=pflash' instead.

Signed-off-by: Thomas Huth 
---
  docs/about/deprecated.rst   | 6 --
  docs/about/removed-features.rst | 7 +++
  hw/misc/sifive_u_otp.c  | 7 ---
  3 files changed, 7 insertions(+), 13 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 


-ENOTENOUGHCOFFEEYET I read v2 and meant to reply there :P



Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL

2023-01-12 Thread Philippe Mathieu-Daudé

On 11/1/23 23:30, Stefan Berger wrote:

To prevent getting stuck on waitpid() in case the target process does
not terminate on SIGTERM, poll on waitpid() for 10s and if the target
process has not changed state until then send a SIGKILL to it.

Signed-off-by: Stefan Berger 
---
  tests/qtest/libqtest.c | 18 +-
  1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 2fbc3b88f3..362b1f724f 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
  {
  #ifndef _WIN32
  pid_t pid;
+uint64_t end;
+
+/* poll for 10s until sending SIGKILL */
+end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;


Maybe we could use getenv() to allow tuning / using different value?


+do {
+pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
+if (pid != 0) {
+break;
+}
+g_usleep(100 * 1000);
+} while (g_get_monotonic_time() < end);
+
+if (pid == 0) {
+kill(s->qemu_pid, SIGKILL);
+TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
+}
  
-TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));

  assert(pid == s->qemu_pid);
  #else
  DWORD ret;





Re: [RESEND PATCH 2/2] migration: report multiFd related thread pid to libvirt

2023-01-12 Thread Jiang Jiacheng via



On 2023/1/12 3:04, Daniel P. Berrangé wrote:
> On Wed, Jan 11, 2023 at 07:00:53PM +, Dr. David Alan Gilbert wrote:
>> * Jiang Jiacheng via (qemu-devel@nongnu.org) wrote:
>>> From: Zheng Chuan 
>>>
>>> Report multiFd related thread pid to libvirt in order to
>>> pin multiFd thread to different cpu.
>>
>> With multifd you may well want to pin different multifd threads
>> to different cores; so you need to include the 'id' and 'name' fields of
>> the multifd thread in the event.
> 
> Are the 'id' / 'name' fields considered stable API for QEMU ?
> 
> IIRC, the mgmt app merely requests the number of multifd threads
> and doesn't assign any identifying names/ids to them, unlike
> iothreads where the mgmt app gives an explicit 'id'.
> 
> 

If the 'id'/'name' of the migration thread is fixed in QEMU API, I think
the related information can be added to the mgmt app to implement
migration thread pin more properly.
And we are considering another choice, which provides a list of
migration thread information for the mgmt app to actively query, so as
to provide more information and avoid exposing thread PIDs to the mgmt
app's public API.

Thanks
Jiang Jiacheng



Re: [PULL v4 76/83] vhost-user: Support vhost_dev_start

2023-01-12 Thread Maxime Coquelin

Hi Laurent,

On 1/11/23 10:50, Laurent Vivier wrote:

On 1/9/23 11:55, Michael S. Tsirkin wrote:

On Fri, Jan 06, 2023 at 03:21:43PM +0100, Laurent Vivier wrote:

Hi,

it seems this patch breaks vhost-user with DPDK.

See https://bugzilla.redhat.com/show_bug.cgi?id=2155173

it seems QEMU doesn't receive the expected commands sequence:

Received unexpected msg type. Expected 22 received 40
Fail to update device iotlb
Received unexpected msg type. Expected 40 received 22
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 1 ring restore failed: -71: Protocol error (71)
Received unexpected msg type. Expected 22 received 11
Fail to update device iotlb
Received unexpected msg type. Expected 11 received 22
vhost VQ 0 ring restore failed: -71: Protocol error (71)
unable to start vhost net: 71: falling back on userspace virtio

It receives VHOST_USER_GET_STATUS (40) when it expects 
VHOST_USER_IOTLB_MSG (22)

and VHOST_USER_IOTLB_MSG when it expects VHOST_USER_GET_STATUS.
and VHOST_USER_GET_VRING_BASE (11) when it expect 
VHOST_USER_GET_STATUS and so on.


Any idea?


We only have a single thread on DPDK side to handle Vhost-user requests,
it will read a request, handle it and reply to it. Then it reads the
next one, etc... So I don't think it is possible to mix request replies
order on DPDK side.

Maybe there are two threads concurrently sending requests on QEMU side?

Regards,
Maxime


Thanks,
Laurent



So I am guessing it's coming from:

 if (msg.hdr.request != request) {
 error_report("Received unexpected msg type. Expected %d 
received %d",

  request, msg.hdr.request);
 return -EPROTO;
 }

in process_message_reply and/or in vhost_user_get_u64.



On 11/7/22 23:53, Michael S. Tsirkin wrote:

From: Yajun Wu 

The motivation of adding vhost-user vhost_dev_start support is to
improve backend configuration speed and reduce live migration VM
downtime.

Today VQ configuration is issued one by one. For virtio net with
multi-queue support, backend needs to update RSS (Receive side
scaling) on every rx queue enable. Updating RSS is time-consuming
(typical time like 7ms).

Implement already defined vhost status and message in the vhost
specification [1].
(a) VHOST_USER_PROTOCOL_F_STATUS
(b) VHOST_USER_SET_STATUS
(c) VHOST_USER_GET_STATUS

Send message VHOST_USER_SET_STATUS with VIRTIO_CONFIG_S_DRIVER_OK for
device start and reset(0) for device stop.

On reception of the DRIVER_OK message, backend can apply the needed 
setting
only once (instead of incremental) and also utilize parallelism on 
enabling

queues.

This improves QEMU's live migration downtime with vhost user backend
implementation by great margin, specially for the large number of 
VQs of 64

from 800 msec to 250 msec.

[1] https://qemu-project.gitlab.io/qemu/interop/vhost-user.html

Signed-off-by: Yajun Wu 
Acked-by: Parav Pandit 
Message-Id: <20221017064452.1226514-3-yaj...@nvidia.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 


Probably easiest to debug from dpdk side.
Does the problem go away if you disable the feature 
VHOST_USER_PROTOCOL_F_STATUS in dpdk?


Maxime could you help to debug this?

Thanks,
Laurent






Re: [PATCH v2] hw/misc/sifive_u_otp: Remove the deprecated OTP config with '-drive if=none'

2023-01-12 Thread Alistair Francis
On Thu, Jan 12, 2023 at 6:40 PM Thomas Huth  wrote:
>
> '-drive if=none' is meant for configuring back-end devices only, so this
> got marked as deprecated in QEMU 6.2. Users should now only use the new
> way with '-drive if=pflash' instead.
>
> Signed-off-by: Thomas Huth 

Reviewed-by: Alistair Francis 

Alistair

> ---
>  docs/about/deprecated.rst   | 6 --
>  docs/about/removed-features.rst | 7 +++
>  hw/misc/sifive_u_otp.c  | 7 ---
>  3 files changed, 7 insertions(+), 13 deletions(-)
>
> diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
> index 68d29642d7..bfe8148490 100644
> --- a/docs/about/deprecated.rst
> +++ b/docs/about/deprecated.rst
> @@ -87,12 +87,6 @@ as short-form boolean values, and passed to plugins as 
> ``arg_name=on``.
>  However, short-form booleans are deprecated and full explicit ``arg_name=on``
>  form is preferred.
>
> -``-drive if=none`` for the sifive_u OTP device (since 6.2)
> -''
> -
> -Using ``-drive if=none`` to configure the OTP device of the sifive_u
> -RISC-V machine is deprecated. Use ``-drive if=pflash`` instead.
> -
>  ``-no-hpet`` (since 8.0)
>  
>
> diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
> index c918cabd1a..6bd0a2b4e4 100644
> --- a/docs/about/removed-features.rst
> +++ b/docs/about/removed-features.rst
> @@ -422,6 +422,13 @@ the value is hexadecimal.  That is, '0x20M' should be 
> written either as
>  ``tty`` and ``parport`` used to be aliases for ``serial`` and ``parallel``
>  respectively. The actual backend names should be used instead.
>
> +``-drive if=none`` for the sifive_u OTP device (removed in 8.0)
> +'''
> +
> +Use ``-drive if=pflash`` to configure the OTP device of the sifive_u
> +RISC-V machine instead.
> +
> +
>  QEMU Machine Protocol (QMP) commands
>  
>
> diff --git a/hw/misc/sifive_u_otp.c b/hw/misc/sifive_u_otp.c
> index 6d7fdb040a..8965f5c22a 100644
> --- a/hw/misc/sifive_u_otp.c
> +++ b/hw/misc/sifive_u_otp.c
> @@ -210,13 +210,6 @@ static void sifive_u_otp_realize(DeviceState *dev, Error 
> **errp)
>  sysbus_init_mmio(SYS_BUS_DEVICE(dev), &s->mmio);
>
>  dinfo = drive_get(IF_PFLASH, 0, 0);
> -if (!dinfo) {
> -dinfo = drive_get(IF_NONE, 0, 0);
> -if (dinfo) {
> -warn_report("using \"-drive if=none\" for the OTP is deprecated, 
> "
> -"use \"-drive if=pflash\" instead.");
> -}
> -}
>  if (dinfo) {
>  int ret;
>  uint64_t perm;
> --
> 2.31.1
>
>



qemu-system-i386 and general protection

2023-01-12 Thread He Zhe
Hi All,

We are experiencing a general protection fault with qemu-system-i386 as follow.
This can be reproduced with kernel v5.15 and latest v6.2-rc3 as we found so far.

It would work well if we reverted the commit
2f8a21d8ff3af484a37edc8ea61d127ec1529ab5 ("target/i386: Enable AVX cpuid bits 
when using TCG")
introduced since qemu 7.2.

We also tried setting cpu to Broadwell and Icelake-Server and got the same 
error.

./qemu-system-i386 -object rng-random,filename=/dev/urandom,id=rng0 -device 
virtio-rng-pci,rng=rng0 -drive file=/tmp/rootfs.ext4,if=virtio,format=raw -usb 
-device usb-tablet -usb -device usb-kbd   -cpu Haswell -machine q35,i8042=off 
-smp 4 -m 8192  -m 8192 -smp cpus=8 -serial mon:stdio -serial null -nographic  
-kernel /tmp/bzImage -append 'root=/dev/vda rw  ip=dhcp console=ttyS0 
console=ttyS1 oprofile.timer=1 tsc=reliable no_timer_check 
rcupdate.rcu_expedited=1 '

[  OK  ] Started System Logging Service.
[  204.194033] traps: named[280] general protection fault ip:b7ef8545 
sp:bf8d5a1c error:0
[  204.198913] audit: type=1701 audit(1673507379.204:2): auid=4294967295 
uid=997 gid=996 ses=4294967295 subj=kernel pid=280 comm="named" ex1
[  204.219923] [ cut here ]
[  204.220455] Bad FPU state detected at restore_fpregs_from_fpstate+0x3a/0x78, 
reinitializing FPU registers.   
[  204.221442] WARNING: CPU: 4 PID: 274 at ../arch/x86/mm/extable.c:127 
fixup_exception+0x3f0/0x41c
[  204.223147] Modules linked in:
[  204.223945] CPU: 4 PID: 274 Comm: rs:main Q:Reg Not tainted 6.2.0-rc3 #1
[  204.224769] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[  204.226061] EIP: fixup_exception+0x3f0/0x41c
[  204.226533] Code: ff ff 8d 74 26 00 0f 0b ba 4c c9 dc d1 e9 10 fd ff ff b1 
01 89 44 24 04 c7 04 24 e0 44 98 d1 88 0d 69 87 cc d1 e8 8c bf
[  204.228038] EAX: 005e EBX: d1aee764 ECX: 0027 EDX: 0001
[  204.228498] ESI: c18efee4 EDI: 000d EBP: c18efe58 ESP: c18efddc
[  204.229102] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 0086
[  204.229662] CR0: 80050033 CR2: bf8d5d54 CR3: 02aaf000 CR4: 001506d0
[  204.230408] Call Trace:
[  204.232101]  ? restore_fpregs_from_fpstate+0x3a/0x78
[  204.232733]  ? __switch_to_asm+0x1c/0xe4
[  204.233028]  ? __schedule+0x28c/0x844
[  204.233362]  ? _raw_spin_lock+0x10/0x34
[  204.233829]  exc_general_protection+0x81/0x340
[  204.234403]  ? futex_wait+0xb4/0x190
[  204.234818]  ? exc_bounds+0xa4/0xa4
[  204.235054]  handle_exception+0x133/0x133
[  204.235629] EIP: restore_fpregs_from_fpstate+0x3a/0x78
[  204.236113] Code: 0a 8d 76 00 db e2 0f 77 db 45 f4 3e 8d 74 26 00 a1 e8 51 
a7 d1 8b 5d f4 21 d0 8b 15 ec 51 a7 d1 8d 7b 40 21 d1 89 ca 04
[  204.236152] EAX: 0007 EBX: c2047200 ECX:  EDX: 
[  204.236171] ESI: c20471c0 EDI: c2047240 EBP: c18eff4c ESP: c18eff40
[  204.236191] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 0046
[  204.236309]  ? exc_bounds+0xa4/0xa4
[  204.236475]  ? exc_bounds+0xa4/0xa4
[  204.240268]  ? restore_fpregs_from_fpstate+0x37/0x78
[FAILED[  204.240752]  switch_fpu_return+0x49/0xe0
[  204.241422]  exit_to_user_mode_prepare+0x189/0x1a0
] Failed to start Berkeley Internet Name Domain (DNS).
[  204.241910]  ? syscall_exit_work+0x10b/0x138
[  204.243209]  syscall_exit_to_user_mode+0x1c/0x38
[  204.243707]  __do_fast_syscall_32+0x56/0xac
[  204.243947]  do_fast_syscall_32+0x32/0x74
[  204.244158]  do_SYSENTER_32+0x15/0x24
[  204.244333]  entry_SYSENTER_32+0x98/0xf1
[  204.244759] EIP: 0xb7f59549
[  204.245200] Code: 03 74 c0 01 10 05 03 74 b8 01 10 06 03 74 b4 01 10 07 03 
74 b0 01 10 08 03 74 d8 01 00 00 00 00 00 51 52 55 89 e5 0f 36
[  204.246900] EAX:  EBX: 012b373c ECX: 0189 EDX: 
[  204.247352] ESI:  EDI:  EBP:  ESP: b69feff0
[  204.247873] DS: 007b ES: 007b FS:  GS: 0033 SS: 007b EFLAGS: 0282
See 'systemctl status named.service' for details.
[  204.248870] ---[ end trace  ]---
[  204.251318] general protection fault, maybe for address 0x0:  [#1] 
PREEMPT SMP
[  204.252076] CPU: 4 PID: 274 Comm: rs:main Q:Reg Tainted: G    W  
6.2.0-rc3 #1
[  204.252685] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.16.1-0-g3208b098f51a-prebuilt.qemu.org 04/01/2014
[  204.253207] EIP: entry_SYSENTER_32+0xe0/0xf1
[  204.253537] Code: 8b 54 24 30 8b 4c 24 3c 8e 64 24 24 5b 83 c4 08 5e 5f 5d 
89 c4 eb 0b 0f 20 d8 0d 00 10 00 00 0f 22 d8 0f ba 34 24 09 96
[  204.254956] EAX:  EBX: 012b373c ECX: b69feff0 EDX: b7f59549
[  204.255282] ESI:  EDI:  EBP:  ESP: ff8b
[  204.255774] DS: 007b ES: 007b FS:  GS: 0033 SS: 0068 EFLAGS: 0282
[  204.256136] CR0: 80050033 CR2: bf8d5d54 CR3: 02aaf000 CR4: 001506d0
[  204.256435] Call Trace:
[  204.257004] Modules linked in:
[  204.257824] ---[ end trace  ]---
[  204.258197] EIP: entry_SYSENTER_32+0x

Re: [PATCH 25/26] tcg: exclude lookup_tb_ptr from helper instrumentation

2023-01-12 Thread Alex Bennée


Richard Henderson  writes:

> On 1/10/23 09:39, Alex Bennée wrote:
>> From: Emilio Cota 
>> It is internal to TCG and therefore we know it does not
>> access guest memory.
>> Related: #1381
>> Signed-off-by: Emilio Cota 
>> Message-Id: <20230108164731.61469-4-c...@braap.org>
>> Signed-off-by: Alex Bennée 
>> ---
>>   tcg/tcg.c | 6 --
>>   1 file changed, 4 insertions(+), 2 deletions(-)
>> diff --git a/tcg/tcg.c b/tcg/tcg.c
>> index da91779890..ee67eefc0c 100644
>> --- a/tcg/tcg.c
>> +++ b/tcg/tcg.c
>> @@ -1652,8 +1652,10 @@ void tcg_gen_callN(void *func, TCGTemp *ret, int 
>> nargs, TCGTemp **args)
>>   op = tcg_op_alloc(INDEX_op_call, total_args);
>> #ifdef CONFIG_PLUGIN
>> -/* detect non-plugin helpers */
>> -if (tcg_ctx->plugin_insn && unlikely(strncmp(info->name, "plugin_", 
>> 7))) {
>> +/* flag helpers that are not internal to TCG */
>> +if (tcg_ctx->plugin_insn &&
>> +strncmp(info->name, "plugin_", 7) &&
>> +strcmp(info->name, "lookup_tb_ptr")) {
>>   tcg_ctx->plugin_insn->calls_helpers = true;
>>   }
>>   #endif
>
> I think this should be detected with
>
>   !(info->flags & TCG_CALL_NO_SIDE_EFFECTS)
>
> i.e., side-effects, which in this case is the possibility of a fault.

That implies that:

DEF_HELPER_FLAGS_2(plugin_vcpu_udata_cb, TCG_CALL_NO_RWG, void, i32, ptr)
DEF_HELPER_FLAGS_4(plugin_vcpu_mem_cb, TCG_CALL_NO_RWG, void, i32, i32, i64, 
ptr)

should be the _SE variants as well right? They do have side-effects but
not in guest state and they shouldn't cause a fault.

>
>
> r~


-- 
Alex Bennée
Virtualisation Tech Lead @ Linaro



Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL

2023-01-12 Thread Daniel P . Berrangé
On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote:
> On 11/1/23 23:30, Stefan Berger wrote:
> > To prevent getting stuck on waitpid() in case the target process does
> > not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> > process has not changed state until then send a SIGKILL to it.
> > 
> > Signed-off-by: Stefan Berger 
> > ---
> >   tests/qtest/libqtest.c | 18 +-
> >   1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> > index 2fbc3b88f3..362b1f724f 100644
> > --- a/tests/qtest/libqtest.c
> > +++ b/tests/qtest/libqtest.c
> > @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
> >   {
> >   #ifndef _WIN32
> >   pid_t pid;
> > +uint64_t end;
> > +
> > +/* poll for 10s until sending SIGKILL */
> > +end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
> 
> Maybe we could use getenv() to allow tuning / using different value?

I'd rather we picked a value large enough that it will work
reliably out of the box for all scenarios with no magic
env required. We're just trying to prevent infinite waits if
something unexpected happens. We don't need to use an
aggressively short value, as most users will never hit this
scenario. I think 30 seconds is large enough to be reliable
but we could easily go higher to 60/120 if we want to be
really really sure.


With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




[PATCH 01/31] e1000e: Fix the code style

2023-01-12 Thread Akihiko Odaki
igb implementation first starts off by copying e1000e code. Correct the
code style before that.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000.c |  41 
 hw/net/e1000e.c|  72 ++--
 hw/net/e1000e_core.c   | 103 ++---
 hw/net/e1000e_core.h   |  66 +-
 hw/net/e1000x_common.h |  44 +-
 5 files changed, 168 insertions(+), 158 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index e26e0a64c1..ac6ce0af21 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -808,10 +808,11 @@ receive_filter(E1000State *s, const uint8_t *buf, int 
size)
 if (e1000x_is_vlan_packet(buf, le16_to_cpu(s->mac_reg[VET])) &&
 e1000x_vlan_rx_filter_enabled(s->mac_reg)) {
 uint16_t vid = lduw_be_p(buf + 14);
-uint32_t vfta = ldl_le_p((uint32_t*)(s->mac_reg + VFTA) +
+uint32_t vfta = ldl_le_p((uint32_t *)(s->mac_reg + VFTA) +
  ((vid >> 5) & 0x7f));
-if ((vfta & (1 << (vid & 0x1f))) == 0)
+if ((vfta & (1 << (vid & 0x1f))) == 0) {
 return 0;
+}
 }
 
 if (!isbcast && !ismcast && (rctl & E1000_RCTL_UPE)) { /* promiscuous 
ucast */
@@ -1220,16 +1221,16 @@ static const readops macreg_readops[] = {
 [TDFPC]   = mac_low13_read,
 [AIT] = mac_low16_read,
 
-[CRCERRS ... MPC]   = &mac_readreg,
-[IP6AT ... IP6AT+3] = &mac_readreg,[IP4AT ... IP4AT+6] = &mac_readreg,
-[FFLT ... FFLT+6]   = &mac_low11_read,
-[RA ... RA+31]  = &mac_readreg,
-[WUPM ... WUPM+31]  = &mac_readreg,
-[MTA ... MTA+127]   = &mac_readreg,
-[VFTA ... VFTA+127] = &mac_readreg,
-[FFMT ... FFMT+254] = &mac_low4_read,
-[FFVT ... FFVT+254] = &mac_readreg,
-[PBM ... PBM+16383] = &mac_readreg,
+[CRCERRS ... MPC] = &mac_readreg,
+[IP6AT ... IP6AT + 3] = &mac_readreg,[IP4AT ... IP4AT + 6] = 
&mac_readreg,
+[FFLT ... FFLT + 6]   = &mac_low11_read,
+[RA ... RA + 31]  = &mac_readreg,
+[WUPM ... WUPM + 31]  = &mac_readreg,
+[MTA ... MTA + 127]   = &mac_readreg,
+[VFTA ... VFTA + 127] = &mac_readreg,
+[FFMT ... FFMT + 254] = &mac_low4_read,
+[FFVT ... FFVT + 254] = &mac_readreg,
+[PBM ... PBM + 16383] = &mac_readreg,
 };
 enum { NREADOPS = ARRAY_SIZE(macreg_readops) };
 
@@ -1252,14 +1253,14 @@ static const writeops macreg_writeops[] = {
 [RDTR]   = set_16bit,  [RADV]   = set_16bit,  [TADV] = set_16bit,
 [ITR]= set_16bit,
 
-[IP6AT ... IP6AT+3] = &mac_writereg, [IP4AT ... IP4AT+6] = &mac_writereg,
-[FFLT ... FFLT+6]   = &mac_writereg,
-[RA ... RA+31]  = &mac_writereg,
-[WUPM ... WUPM+31]  = &mac_writereg,
-[MTA ... MTA+127]   = &mac_writereg,
-[VFTA ... VFTA+127] = &mac_writereg,
-[FFMT ... FFMT+254] = &mac_writereg, [FFVT ... FFVT+254] = &mac_writereg,
-[PBM ... PBM+16383] = &mac_writereg,
+[IP6AT ... IP6AT + 3] = &mac_writereg, [IP4AT ... IP4AT + 6] = 
&mac_writereg,
+[FFLT ... FFLT + 6]   = &mac_writereg,
+[RA ... RA + 31]  = &mac_writereg,
+[WUPM ... WUPM + 31]  = &mac_writereg,
+[MTA ... MTA + 127]   = &mac_writereg,
+[VFTA ... VFTA + 127] = &mac_writereg,
+[FFMT ... FFMT + 254] = &mac_writereg, [FFVT ... FFVT + 254] = 
&mac_writereg,
+[PBM ... PBM + 16383] = &mac_writereg,
 };
 
 enum { NWRITEOPS = ARRAY_SIZE(macreg_writeops) };
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 7523e9f5d2..8635ca16c6 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -1,37 +1,37 @@
 /*
-* QEMU INTEL 82574 GbE NIC emulation
-*
-* Software developer's manuals:
-* 
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf
-*
-* Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
-* Developed by Daynix Computing LTD (http://www.daynix.com)
-*
-* Authors:
-* Dmitry Fleytman 
-* Leonid Bloch 
-* Yan Vugenfirer 
-*
-* Based on work done by:
-* Nir Peleg, Tutis Systems Ltd. for Qumranet Inc.
-* Copyright (c) 2008 Qumranet
-* Based on work done by:
-* Copyright (c) 2007 Dan Aloni
-* Copyright (c) 2004 Antony T Curtis
-*
-* This library is free software; you can redistribute it and/or
-* modify it under the terms of the GNU Lesser General Public
-* License as published by the Free Software Foundation; either
-* version 2.1 of the License, or (at your option) any later version.
-*
-* This library is distributed in the hope that it will be useful,
-* but WITHOUT ANY WARRANTY; without even the implied warranty of
-* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
-* Lesser General Public License for more details.
-*
-* You should have received a copy of the GNU Lesser General Public
-* License along with this library; if not, see .
-*/
+ * QEMU INTEL 82574 GbE NIC emulation
+ *
+ * Software developer's manuals:
+ * 
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller

[PATCH 05/31] e1000: Mask registers when writing

2023-01-12 Thread Akihiko Odaki
When a register has effective bits fewer than their width, the old code
inconsistently masked when writing or reading. Make the code consistent
by always masking when writing, and remove some code duplication.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000.c | 84 +++---
 1 file changed, 31 insertions(+), 53 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 246e7670a8..7c28200cab 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1062,30 +1062,6 @@ mac_readreg(E1000State *s, int index)
 return s->mac_reg[index];
 }
 
-static uint32_t
-mac_low4_read(E1000State *s, int index)
-{
-return s->mac_reg[index] & 0xf;
-}
-
-static uint32_t
-mac_low11_read(E1000State *s, int index)
-{
-return s->mac_reg[index] & 0x7ff;
-}
-
-static uint32_t
-mac_low13_read(E1000State *s, int index)
-{
-return s->mac_reg[index] & 0x1fff;
-}
-
-static uint32_t
-mac_low16_read(E1000State *s, int index)
-{
-return s->mac_reg[index] & 0x;
-}
-
 static uint32_t
 mac_icr_read(E1000State *s, int index)
 {
@@ -1138,11 +1114,17 @@ set_rdt(E1000State *s, int index, uint32_t val)
 }
 }
 
-static void
-set_16bit(E1000State *s, int index, uint32_t val)
-{
-s->mac_reg[index] = val & 0x;
-}
+#define LOW_BITS_SET_FUNC(num) \
+static void\
+set_##num##bit(E1000State *s, int index, uint32_t val) \
+{  \
+s->mac_reg[index] = val & (BIT(num) - 1);  \
+}
+
+LOW_BITS_SET_FUNC(4)
+LOW_BITS_SET_FUNC(11)
+LOW_BITS_SET_FUNC(13)
+LOW_BITS_SET_FUNC(16)
 
 static void
 set_dlen(E1000State *s, int index, uint32_t val)
@@ -1196,7 +1178,9 @@ static const readops macreg_readops[] = {
 getreg(XONRXC),   getreg(XONTXC),   getreg(XOFFRXC),  getreg(XOFFTXC),
 getreg(RFC),  getreg(RJC),  getreg(RNBC), getreg(TSCTFC),
 getreg(MGTPRC),   getreg(MGTPDC),   getreg(MGTPTC),   getreg(GORCL),
-getreg(GOTCL),
+getreg(GOTCL),getreg(RDFH), getreg(RDFT), getreg(RDFHS),
+getreg(RDFTS),getreg(RDFPC),getreg(TDFH), getreg(TDFT),
+getreg(TDFHS),getreg(TDFTS),getreg(TDFPC),getreg(AIT),
 
 [TOTH]= mac_read_clr8,  [TORH]= mac_read_clr8,
 [GOTCH]   = mac_read_clr8,  [GORCH]   = mac_read_clr8,
@@ -1214,22 +1198,15 @@ static const readops macreg_readops[] = {
 [MPTC]= mac_read_clr4,
 [ICR] = mac_icr_read,   [EECD]= get_eecd,
 [EERD]= flash_eerd_read,
-[RDFH]= mac_low13_read, [RDFT]= mac_low13_read,
-[RDFHS]   = mac_low13_read, [RDFTS]   = mac_low13_read,
-[RDFPC]   = mac_low13_read,
-[TDFH]= mac_low11_read, [TDFT]= mac_low11_read,
-[TDFHS]   = mac_low13_read, [TDFTS]   = mac_low13_read,
-[TDFPC]   = mac_low13_read,
-[AIT] = mac_low16_read,
 
 [CRCERRS ... MPC] = &mac_readreg,
 [IP6AT ... IP6AT + 3] = &mac_readreg,[IP4AT ... IP4AT + 6] = 
&mac_readreg,
-[FFLT ... FFLT + 6]   = &mac_low11_read,
+[FFLT ... FFLT + 6]   = &mac_readreg,
 [RA ... RA + 31]  = &mac_readreg,
 [WUPM ... WUPM + 31]  = &mac_readreg,
 [MTA ... MTA + 127]   = &mac_readreg,
 [VFTA ... VFTA + 127] = &mac_readreg,
-[FFMT ... FFMT + 254] = &mac_low4_read,
+[FFMT ... FFMT + 254] = &mac_readreg,
 [FFVT ... FFVT + 254] = &mac_readreg,
 [PBM ... PBM + 16383] = &mac_readreg,
 };
@@ -1241,26 +1218,27 @@ static const writeops macreg_writeops[] = {
 putreg(PBA),  putreg(EERD), putreg(SWSM), putreg(WUFC),
 putreg(TDBAL),putreg(TDBAH),putreg(TXDCTL),   putreg(RDBAH),
 putreg(RDBAL),putreg(LEDCTL),   putreg(VET),  putreg(FCRUC),
-putreg(TDFH), putreg(TDFT), putreg(TDFHS),putreg(TDFTS),
-putreg(TDFPC),putreg(RDFH), putreg(RDFT), putreg(RDFHS),
-putreg(RDFTS),putreg(RDFPC),putreg(IPAV), putreg(WUC),
-putreg(WUS),  putreg(AIT),
-
-[TDLEN]  = set_dlen,   [RDLEN]  = set_dlen,   [TCTL] = set_tctl,
-[TDT]= set_tctl,   [MDIC]   = set_mdic,   [ICS]  = set_ics,
-[TDH]= set_16bit,  [RDH]= set_16bit,  [RDT]  = set_rdt,
-[IMC]= set_imc,[IMS]= set_ims,[ICR]  = set_icr,
-[EECD]   = set_eecd,   [RCTL]   = set_rx_control, [CTRL] = set_ctrl,
-[RDTR]   = set_16bit,  [RADV]   = set_16bit,  [TADV] = set_16bit,
-[ITR]= set_16bit,
+putreg(IPAV), putreg(WUC),
+putreg(WUS),
+
+[TDLEN]  = set_dlen,   [RDLEN]  = set_dlen,   [TCTL]  = set_tctl,
+[TDT]= set_tctl,   [MDIC]   = set_mdic,   [ICS]   = set_ics,
+[TDH]= set_16bit,  [RDH]= set_16bit,  [RDT]   = set_rdt,
+[IMC]= set_imc,[IMS]= set_ims,[ICR]   = set_icr,
+[EECD]   = set_eecd,   [RCTL]   = set_rx_control, [CTRL]  = set_ctrl,
+[RDTR]   = set_16bit,  [RADV]   =

[PATCH 04/31] e1000: Use hw/net/mii.h

2023-01-12 Thread Akihiko Odaki
hw/net/mii.h provides common definitions for MII.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000.c | 86 ++--
 hw/net/e1000_regs.h| 46 
 hw/net/e1000e.c|  1 +
 hw/net/e1000e_core.c   | 99 +-
 hw/net/e1000x_common.c |  5 ++-
 hw/net/e1000x_common.h |  8 ++--
 6 files changed, 101 insertions(+), 144 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index ac6ce0af21..246e7670a8 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -26,6 +26,7 @@
 
 
 #include "qemu/osdep.h"
+#include "hw/net/mii.h"
 #include "hw/pci/pci.h"
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
@@ -181,67 +182,67 @@ e1000_autoneg_done(E1000State *s)
 static bool
 have_autoneg(E1000State *s)
 {
-return chkflag(AUTONEG) && (s->phy_reg[PHY_CTRL] & MII_CR_AUTO_NEG_EN);
+return chkflag(AUTONEG) && (s->phy_reg[MII_BMCR] & MII_BMCR_AUTOEN);
 }
 
 static void
 set_phy_ctrl(E1000State *s, int index, uint16_t val)
 {
-/* bits 0-5 reserved; MII_CR_[RESTART_AUTO_NEG,RESET] are self clearing */
-s->phy_reg[PHY_CTRL] = val & ~(0x3f |
-   MII_CR_RESET |
-   MII_CR_RESTART_AUTO_NEG);
+/* bits 0-5 reserved; MII_BMCR_[ANRESTART,RESET] are self clearing */
+s->phy_reg[MII_BMCR] = val & ~(0x3f |
+   MII_BMCR_RESET |
+   MII_BMCR_ANRESTART);
 
 /*
  * QEMU 1.3 does not support link auto-negotiation emulation, so if we
  * migrate during auto negotiation, after migration the link will be
  * down.
  */
-if (have_autoneg(s) && (val & MII_CR_RESTART_AUTO_NEG)) {
+if (have_autoneg(s) && (val & MII_BMCR_ANRESTART)) {
 e1000x_restart_autoneg(s->mac_reg, s->phy_reg, s->autoneg_timer);
 }
 }
 
 static void (*phyreg_writeops[])(E1000State *, int, uint16_t) = {
-[PHY_CTRL] = set_phy_ctrl,
+[MII_BMCR] = set_phy_ctrl,
 };
 
 enum { NPHYWRITEOPS = ARRAY_SIZE(phyreg_writeops) };
 
 enum { PHY_R = 1, PHY_W = 2, PHY_RW = PHY_R | PHY_W };
 static const char phy_regcap[0x20] = {
-[PHY_STATUS]  = PHY_R, [M88E1000_EXT_PHY_SPEC_CTRL] = PHY_RW,
-[PHY_ID1] = PHY_R, [M88E1000_PHY_SPEC_CTRL] = PHY_RW,
-[PHY_CTRL]= PHY_RW,[PHY_1000T_CTRL] = PHY_RW,
-[PHY_LP_ABILITY]  = PHY_R, [PHY_1000T_STATUS]   = PHY_R,
-[PHY_AUTONEG_ADV] = PHY_RW,[M88E1000_RX_ERR_CNTR]   = PHY_R,
-[PHY_ID2] = PHY_R, [M88E1000_PHY_SPEC_STATUS]   = PHY_R,
-[PHY_AUTONEG_EXP] = PHY_R,
+[MII_BMSR] = PHY_R,   [M88E1000_EXT_PHY_SPEC_CTRL] = PHY_RW,
+[MII_PHYID1] = PHY_R, [M88E1000_PHY_SPEC_CTRL] = PHY_RW,
+[MII_BMCR]   = PHY_RW,[MII_CTRL1000]   = PHY_RW,
+[MII_ANLPAR] = PHY_R, [MII_STAT1000]   = PHY_R,
+[MII_ANAR]   = PHY_RW,[M88E1000_RX_ERR_CNTR]   = PHY_R,
+[MII_PHYID2] = PHY_R, [M88E1000_PHY_SPEC_STATUS]   = PHY_R,
+[MII_ANER]   = PHY_R,
 };
 
-/* PHY_ID2 documented in 8254x_GBe_SDM.pdf, pp. 250 */
+/* MII_PHYID2 documented in 8254x_GBe_SDM.pdf, pp. 250 */
 static const uint16_t phy_reg_init[] = {
-[PHY_CTRL]   = MII_CR_SPEED_SELECT_MSB |
-   MII_CR_FULL_DUPLEX |
-   MII_CR_AUTO_NEG_EN,
-
-[PHY_STATUS] = MII_SR_EXTENDED_CAPS |
-   MII_SR_LINK_STATUS |   /* link initially up */
-   MII_SR_AUTONEG_CAPS |
-   /* MII_SR_AUTONEG_COMPLETE: initially NOT completed */
-   MII_SR_PREAMBLE_SUPPRESS |
-   MII_SR_EXTENDED_STATUS |
-   MII_SR_10T_HD_CAPS |
-   MII_SR_10T_FD_CAPS |
-   MII_SR_100X_HD_CAPS |
-   MII_SR_100X_FD_CAPS,
-
-[PHY_ID1] = 0x141,
-/* [PHY_ID2] configured per DevId, from e1000_reset() */
-[PHY_AUTONEG_ADV] = 0xde1,
-[PHY_LP_ABILITY] = 0x1e0,
-[PHY_1000T_CTRL] = 0x0e00,
-[PHY_1000T_STATUS] = 0x3c00,
+[MII_BMCR] = MII_BMCR_SPEED1000 |
+ MII_BMCR_FD |
+ MII_BMCR_AUTOEN,
+
+[MII_BMSR] = MII_BMSR_EXTCAP |
+ MII_BMSR_LINK_ST |   /* link initially up */
+ MII_BMSR_AUTONEG |
+ /* MII_BMSR_AN_COMP: initially NOT completed */
+ MII_BMSR_MFPS |
+ MII_BMSR_EXTSTAT |
+ MII_BMSR_10T_HD |
+ MII_BMSR_10T_FD |
+ MII_BMSR_100TX_HD |
+ MII_BMSR_100TX_FD,
+
+[MII_PHYID1] = 0x141,
+/* [MII_PHYID2] configured per DevId, from e1000_reset() */
+[MII_ANAR] = 0xde1,
+[MII_ANLPAR] = 0x1e0,
+[MII_CTRL1000] = 0x0e00,
+[MII_STAT1000] = 0x3c00,
 [M88E1000_PHY_SPEC_CTRL] = 0x360,
 [M88E1000_PHY_SPEC_STATUS] = 0xac00,
 [M88E1000_EXT_PHY_SPEC_CTRL] = 0x0d60,
@@ -387,7 +388,7 @@ static void e1000_rese

[PATCH 27/31] tests/qtest/libqos/e1000e: Export macreg functions

2023-01-12 Thread Akihiko Odaki
They will be useful for igb testing.

Signed-off-by: Akihiko Odaki 
---
 tests/qtest/libqos/e1000e.c | 12 
 tests/qtest/libqos/e1000e.h | 12 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qtest/libqos/e1000e.c b/tests/qtest/libqos/e1000e.c
index 28fb3052aa..925654c7fd 100644
--- a/tests/qtest/libqos/e1000e.c
+++ b/tests/qtest/libqos/e1000e.c
@@ -36,18 +36,6 @@
 
 #define E1000E_RING_LEN (0x1000)
 
-static void e1000e_macreg_write(QE1000E *d, uint32_t reg, uint32_t val)
-{
-QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
-qpci_io_writel(&d_pci->pci_dev, d_pci->mac_regs, reg, val);
-}
-
-static uint32_t e1000e_macreg_read(QE1000E *d, uint32_t reg)
-{
-QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
-return qpci_io_readl(&d_pci->pci_dev, d_pci->mac_regs, reg);
-}
-
 void e1000e_tx_ring_push(QE1000E *d, void *descr)
 {
 QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
diff --git a/tests/qtest/libqos/e1000e.h b/tests/qtest/libqos/e1000e.h
index 5e2b201aa7..30643c8094 100644
--- a/tests/qtest/libqos/e1000e.h
+++ b/tests/qtest/libqos/e1000e.h
@@ -42,6 +42,18 @@ struct QE1000E_PCI {
 QE1000E e1000e;
 };
 
+static inline void e1000e_macreg_write(QE1000E *d, uint32_t reg, uint32_t val)
+{
+QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
+qpci_io_writel(&d_pci->pci_dev, d_pci->mac_regs, reg, val);
+}
+
+static inline uint32_t e1000e_macreg_read(QE1000E *d, uint32_t reg)
+{
+QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
+return qpci_io_readl(&d_pci->pci_dev, d_pci->mac_regs, reg);
+}
+
 void e1000e_wait_isr(QE1000E *d, uint16_t msg_id);
 void e1000e_tx_ring_push(QE1000E *d, void *descr);
 void e1000e_rx_ring_push(QE1000E *d, void *descr);
-- 
2.39.0




[PATCH 21/31] e1000: Split header files

2023-01-12 Thread Akihiko Odaki
Some definitions in the header files are invalid for igb so extract
them to new header files to keep igb from referring to them.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000.c |   2 +-
 hw/net/e1000_common.h  | 104 +
 hw/net/e1000_regs.h| 927 +---
 hw/net/e1000e.c|   4 +-
 hw/net/e1000e_core.c   |   2 +-
 hw/net/e1000x_common.c |   2 +-
 hw/net/e1000x_common.h |  74 
 hw/net/e1000x_regs.h   | 940 +
 8 files changed, 1051 insertions(+), 1004 deletions(-)
 create mode 100644 hw/net/e1000_common.h
 create mode 100644 hw/net/e1000x_regs.h

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index a66cb39c8b..30b9d039f8 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -39,7 +39,7 @@
 #include "qemu/module.h"
 #include "qemu/range.h"
 
-#include "e1000x_common.h"
+#include "e1000_common.h"
 #include "trace.h"
 #include "qom/object.h"
 
diff --git a/hw/net/e1000_common.h b/hw/net/e1000_common.h
new file mode 100644
index 00..56afad3feb
--- /dev/null
+++ b/hw/net/e1000_common.h
@@ -0,0 +1,104 @@
+/*
+ * QEMU e1000(e) emulation - shared code
+ *
+ * Copyright (c) 2008 Qumranet
+ *
+ * Based on work done by:
+ * Nir Peleg, Tutis Systems Ltd. for Qumranet Inc.
+ * Copyright (c) 2007 Dan Aloni
+ * Copyright (c) 2004 Antony T Curtis
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef HW_NET_E1000_COMMON_H
+#define HW_NET_E1000_COMMON_H
+
+#include "e1000_regs.h"
+
+#define defreg(x)   x = (E1000_##x >> 2)
+enum {
+defreg(CTRL),defreg(EECD),defreg(EERD),defreg(GPRC),
+defreg(GPTC),defreg(ICR), defreg(ICS), defreg(IMC),
+defreg(IMS), defreg(LEDCTL),  defreg(MANC),defreg(MDIC),
+defreg(MPC), defreg(PBA), defreg(RCTL),defreg(RDBAH0),
+defreg(RDBAL0),  defreg(RDH0),defreg(RDLEN0),  defreg(RDT0),
+defreg(STATUS),  defreg(SWSM),defreg(TCTL),defreg(TDBAH),
+defreg(TDBAL),   defreg(TDH), defreg(TDLEN),   defreg(TDT),
+defreg(TDLEN1),  defreg(TDBAL1),  defreg(TDBAH1),  defreg(TDH1),
+defreg(TDT1),defreg(TORH),defreg(TORL),defreg(TOTH),
+defreg(TOTL),defreg(TPR), defreg(TPT), defreg(TXDCTL),
+defreg(WUFC),defreg(RA),  defreg(MTA), defreg(CRCERRS),
+defreg(VFTA),defreg(VET), defreg(RDTR),defreg(RADV),
+defreg(TADV),defreg(ITR), defreg(SCC), defreg(ECOL),
+defreg(MCC), defreg(LATECOL), defreg(COLC),defreg(DC),
+defreg(TNCRS),   defreg(SEQEC),   defreg(CEXTERR), defreg(RLEC),
+defreg(XONRXC),  defreg(XONTXC),  defreg(XOFFRXC), defreg(XOFFTXC),
+defreg(FCRUC),   defreg(AIT), defreg(TDFH),defreg(TDFT),
+defreg(TDFHS),   defreg(TDFTS),   defreg(TDFPC),   defreg(WUC),
+defreg(WUS), defreg(POEMB),   defreg(PBS), defreg(RDFH),
+defreg(RDFT),defreg(RDFHS),   defreg(RDFTS),   defreg(RDFPC),
+defreg(PBM), defreg(IPAV),defreg(IP4AT),   defreg(IP6AT),
+defreg(WUPM),defreg(FFLT),defreg(FFMT),defreg(FFVT),
+defreg(TARC0),   defreg(TARC1),   defreg(IAM), defreg(EXTCNF_CTRL),
+defreg(GCR), defreg(TIMINCA), defreg(EIAC),defreg(CTRL_EXT),
+defreg(IVAR),defreg(MFUTP01), defreg(MFUTP23), defreg(MANC2H),
+defreg(MFVAL),   defreg(MDEF),defreg(FACTPS),  defreg(FTFT),
+defreg(RUC), defreg(ROC), defreg(RFC), defreg(RJC),
+defreg(PRC64),   defreg(PRC127),  defreg(PRC255),  defreg(PRC511),
+defreg(PRC1023), defreg(PRC1522), defreg(PTC64),   defreg(PTC127),
+defreg(PTC255),  defreg(PTC511),  defreg(PTC1023), defreg(PTC1522),
+defreg(GORCL),   defreg(GORCH),   defreg(GOTCL),   defreg(GOTCH),
+defreg(RNBC),defreg(BPRC),defreg(MPRC),defreg(RFCTL),
+defreg(PSRCTL),  defreg(MPTC),defreg(BPTC),defreg(TSCTFC),
+defreg(IAC), defreg(MGTPRC),  defreg(MGTPDC),  defreg(MGTPTC),
+defreg(TSCTC),   defreg(RXCSUM),  defreg(FUNCTAG), defreg(GSCL_1),
+defreg(GSCL_2),  defreg(GSCL_3),  defreg(GSCL_4),  defreg(GSCN_0),
+defreg(GSCN_1),  defreg(GSCN_2),  defreg(GSCN_3),  defreg(GCR2),
+defreg(RAID),defreg(RSRPD),   defreg(TIDV),defreg(EITR),
+defreg(MRQC),defreg(RETA),defreg(RSSRK),   de

[PATCH 15/31] e1000e: Introduce e1000_rx_desc_union

2023-01-12 Thread Akihiko Odaki
Before this change, e1000e_write_packet_to_guest() allocated the
receive descriptor buffer as an array of uint8_t. This does not ensure
the buffer is sufficiently aligned.

Introduce e1000_rx_desc_union type, a union type of all receive
descriptor types to correct this.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000_regs.h  |   1 -
 hw/net/e1000e_core.c | 115 +--
 2 files changed, 57 insertions(+), 59 deletions(-)

diff --git a/hw/net/e1000_regs.h b/hw/net/e1000_regs.h
index 6a36573802..4545fe25a6 100644
--- a/hw/net/e1000_regs.h
+++ b/hw/net/e1000_regs.h
@@ -1061,7 +1061,6 @@ union e1000_rx_desc_packet_split {
 #define E1000_RING_DESC_LEN_SHIFT (4)
 
 #define E1000_MIN_RX_DESC_LEN   E1000_RING_DESC_LEN
-#define E1000_MAX_RX_DESC_LEN   (sizeof(union e1000_rx_desc_packet_split))
 
 /* Receive Descriptor bit definitions */
 #define E1000_RXD_STAT_DD   0x01/* Descriptor Done */
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index b8670662c8..d8c17baf8f 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -55,6 +55,12 @@
 
 #define E1000E_MAX_TX_FRAGS (64)
 
+union e1000_rx_desc_union {
+struct e1000_rx_desc legacy;
+union e1000_rx_desc_extended extended;
+union e1000_rx_desc_packet_split packet_split;
+};
+
 static inline void
 e1000e_set_interrupt_cause(E1000ECore *core, uint32_t val);
 
@@ -1053,29 +1059,28 @@ e1000e_receive_filter(E1000ECore *core, const uint8_t 
*buf, int size)
 }
 
 static inline void
-e1000e_read_lgcy_rx_descr(E1000ECore *core, uint8_t *desc, hwaddr *buff_addr)
+e1000e_read_lgcy_rx_descr(E1000ECore *core, struct e1000_rx_desc *desc,
+  hwaddr *buff_addr)
 {
-struct e1000_rx_desc *d = (struct e1000_rx_desc *) desc;
-*buff_addr = le64_to_cpu(d->buffer_addr);
+*buff_addr = le64_to_cpu(desc->buffer_addr);
 }
 
 static inline void
-e1000e_read_ext_rx_descr(E1000ECore *core, uint8_t *desc, hwaddr *buff_addr)
+e1000e_read_ext_rx_descr(E1000ECore *core, union e1000_rx_desc_extended *desc,
+ hwaddr *buff_addr)
 {
-union e1000_rx_desc_extended *d = (union e1000_rx_desc_extended *) desc;
-*buff_addr = le64_to_cpu(d->read.buffer_addr);
+*buff_addr = le64_to_cpu(desc->read.buffer_addr);
 }
 
 static inline void
-e1000e_read_ps_rx_descr(E1000ECore *core, uint8_t *desc,
+e1000e_read_ps_rx_descr(E1000ECore *core,
+union e1000_rx_desc_packet_split *desc,
 hwaddr (*buff_addr)[MAX_PS_BUFFERS])
 {
 int i;
-union e1000_rx_desc_packet_split *d =
-(union e1000_rx_desc_packet_split *) desc;
 
 for (i = 0; i < MAX_PS_BUFFERS; i++) {
-(*buff_addr)[i] = le64_to_cpu(d->read.buffer_addr[i]);
+(*buff_addr)[i] = le64_to_cpu(desc->read.buffer_addr[i]);
 }
 
 trace_e1000e_rx_desc_ps_read((*buff_addr)[0], (*buff_addr)[1],
@@ -1083,17 +1088,17 @@ e1000e_read_ps_rx_descr(E1000ECore *core, uint8_t *desc,
 }
 
 static inline void
-e1000e_read_rx_descr(E1000ECore *core, uint8_t *desc,
+e1000e_read_rx_descr(E1000ECore *core, union e1000_rx_desc_union *desc,
  hwaddr (*buff_addr)[MAX_PS_BUFFERS])
 {
 if (e1000e_rx_use_legacy_descriptor(core)) {
-e1000e_read_lgcy_rx_descr(core, desc, &(*buff_addr)[0]);
+e1000e_read_lgcy_rx_descr(core, &desc->legacy, &(*buff_addr)[0]);
 (*buff_addr)[1] = (*buff_addr)[2] = (*buff_addr)[3] = 0;
 } else {
 if (core->mac[RCTL] & E1000_RCTL_DTYP_PS) {
-e1000e_read_ps_rx_descr(core, desc, buff_addr);
+e1000e_read_ps_rx_descr(core, &desc->packet_split, buff_addr);
 } else {
-e1000e_read_ext_rx_descr(core, desc, &(*buff_addr)[0]);
+e1000e_read_ext_rx_descr(core, &desc->extended, &(*buff_addr)[0]);
 (*buff_addr)[1] = (*buff_addr)[2] = (*buff_addr)[3] = 0;
 }
 }
@@ -1264,7 +1269,7 @@ func_exit:
 }
 
 static inline void
-e1000e_write_lgcy_rx_descr(E1000ECore *core, uint8_t *desc,
+e1000e_write_lgcy_rx_descr(E1000ECore *core, struct e1000_rx_desc *desc,
struct NetRxPkt *pkt,
const E1000E_RSSInfo *rss_info,
uint16_t length)
@@ -1272,71 +1277,66 @@ e1000e_write_lgcy_rx_descr(E1000ECore *core, uint8_t 
*desc,
 uint32_t status_flags, rss, mrq;
 uint16_t ip_id;
 
-struct e1000_rx_desc *d = (struct e1000_rx_desc *) desc;
-
 assert(!rss_info->enabled);
 
-d->length = cpu_to_le16(length);
-d->csum = 0;
+desc->length = cpu_to_le16(length);
+desc->csum = 0;
 
 e1000e_build_rx_metadata(core, pkt, pkt != NULL,
  rss_info,
  &rss, &mrq,
  &status_flags, &ip_id,
- &d->special);
-d->errors = (uint8_t) (le32_to_cpu(status_flags) >> 24);
-d->status = (uint8_t) le32_to_cpu(status_flags);
+ 

[PATCH 09/31] e1000: Use memcpy to intialize registers

2023-01-12 Thread Akihiko Odaki
Use memcpy instead of memmove to initialize registers. The initial
register templates and register table instances will never overlap.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 8412a751ae..1bcc0cd4f3 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -390,10 +390,10 @@ static void e1000_reset(void *opaque)
 d->mit_irq_level = 0;
 d->mit_ide = 0;
 memset(d->phy_reg, 0, sizeof d->phy_reg);
-memmove(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
+memcpy(d->phy_reg, phy_reg_init, sizeof phy_reg_init);
 d->phy_reg[MII_PHYID2] = edc->phy_id2;
 memset(d->mac_reg, 0, sizeof d->mac_reg);
-memmove(d->mac_reg, mac_reg_init, sizeof mac_reg_init);
+memcpy(d->mac_reg, mac_reg_init, sizeof mac_reg_init);
 d->rxbuf_min_shift = 1;
 memset(&d->tx, 0, sizeof d->tx);
 
-- 
2.39.0




[PATCH 02/31] hw/net: Add more MII definitions

2023-01-12 Thread Akihiko Odaki
The definitions will be used by igb.

Signed-off-by: Akihiko Odaki 
---
 include/hw/net/mii.h | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/include/hw/net/mii.h b/include/hw/net/mii.h
index 4ae4dcce7e..c6a767a49a 100644
--- a/include/hw/net/mii.h
+++ b/include/hw/net/mii.h
@@ -81,20 +81,31 @@
 #define MII_ANLPAR_ACK  (1 << 14)
 #define MII_ANLPAR_PAUSEASY (1 << 11) /* can pause asymmetrically */
 #define MII_ANLPAR_PAUSE(1 << 10) /* can pause */
+#define MII_ANLPAR_T4   (1 << 9)
 #define MII_ANLPAR_TXFD (1 << 8)
 #define MII_ANLPAR_TX   (1 << 7)
 #define MII_ANLPAR_10FD (1 << 6)
 #define MII_ANLPAR_10   (1 << 5)
 #define MII_ANLPAR_CSMACD   (1 << 0)
 
-#define MII_ANER_NWAY   (1 << 0) /* Can do N-way auto-nego */
+#define MII_ANER_NP (1 << 2)  /* Next Page Able */
+#define MII_ANER_NWAY   (1 << 0)  /* Can do N-way auto-nego */
 
+#define MII_ANNP_MP (1 << 13) /* Message Page */
+
+#define MII_CTRL1000_MASTER (1 << 11) /* MASTER-SLAVE Manual Configuration 
Value */
+#define MII_CTRL1000_PORT   (1 << 10) /* T2_Repeater/DTE bit */
 #define MII_CTRL1000_FULL   (1 << 9)  /* 1000BASE-T full duplex */
 #define MII_CTRL1000_HALF   (1 << 8)  /* 1000BASE-T half duplex */
 
+#define MII_STAT1000_LOK(1 << 13) /* Local Receiver Status */
+#define MII_STAT1000_ROK(1 << 12) /* Remote Receiver Status */
 #define MII_STAT1000_FULL   (1 << 11) /* 1000BASE-T full duplex */
 #define MII_STAT1000_HALF   (1 << 10) /* 1000BASE-T half duplex */
 
+#define MII_EXTSTAT_1000T_FD (1 << 13) /* 1000BASE-T Full Duplex */
+#define MII_EXTSTAT_1000T_HD (1 << 12) /* 1000BASE-T Half Duplex */
+
 /* List of vendor identifiers */
 /* RealTek 8201 */
 #define RTL8201CP_PHYID10x
-- 
2.39.0




[PATCH 03/31] fsl_etsec: Use hw/net/mii.h

2023-01-12 Thread Akihiko Odaki
hw/net/mii.h provides common definitions for MII.

Signed-off-by: Akihiko Odaki 
---
 hw/net/fsl_etsec/etsec.c | 11 ++-
 hw/net/fsl_etsec/etsec.h | 17 -
 hw/net/fsl_etsec/miim.c  |  5 +++--
 include/hw/net/mii.h |  1 +
 4 files changed, 10 insertions(+), 24 deletions(-)

diff --git a/hw/net/fsl_etsec/etsec.c b/hw/net/fsl_etsec/etsec.c
index b75d8e3dce..b008dbb274 100644
--- a/hw/net/fsl_etsec/etsec.c
+++ b/hw/net/fsl_etsec/etsec.c
@@ -29,6 +29,7 @@
 #include "qemu/osdep.h"
 #include "hw/sysbus.h"
 #include "hw/irq.h"
+#include "hw/net/mii.h"
 #include "hw/ptimer.h"
 #include "hw/qdev-properties.h"
 #include "etsec.h"
@@ -339,11 +340,11 @@ static void etsec_reset(DeviceState *d)
 etsec->rx_buffer_len = 0;
 
 etsec->phy_status =
-MII_SR_EXTENDED_CAPS| MII_SR_LINK_STATUS   | MII_SR_AUTONEG_CAPS  |
-MII_SR_AUTONEG_COMPLETE | MII_SR_PREAMBLE_SUPPRESS |
-MII_SR_EXTENDED_STATUS  | MII_SR_100T2_HD_CAPS | MII_SR_100T2_FD_CAPS |
-MII_SR_10T_HD_CAPS  | MII_SR_10T_FD_CAPS   | MII_SR_100X_HD_CAPS  |
-MII_SR_100X_FD_CAPS | MII_SR_100T4_CAPS;
+MII_BMSR_EXTCAP   | MII_BMSR_LINK_ST  | MII_BMSR_AUTONEG  |
+MII_BMSR_AN_COMP  | MII_BMSR_MFPS | MII_BMSR_EXTSTAT  |
+MII_BMSR_100T2_HD | MII_BMSR_100T2_FD |
+MII_BMSR_10T_HD   | MII_BMSR_10T_FD   |
+MII_BMSR_100TX_HD | MII_BMSR_100TX_FD | MII_BMSR_100T4;
 
 etsec_update_irq(etsec);
 }
diff --git a/hw/net/fsl_etsec/etsec.h b/hw/net/fsl_etsec/etsec.h
index 3c625c955c..3860864a3f 100644
--- a/hw/net/fsl_etsec/etsec.h
+++ b/hw/net/fsl_etsec/etsec.h
@@ -76,23 +76,6 @@ typedef struct eTSEC_rxtx_bd {
 #define FCB_TX_CTU (1 << 1)
 #define FCB_TX_NPH (1 << 0)
 
-/* PHY Status Register */
-#define MII_SR_EXTENDED_CAPS 0x0001/* Extended register capabilities */
-#define MII_SR_JABBER_DETECT 0x0002/* Jabber Detected */
-#define MII_SR_LINK_STATUS   0x0004/* Link Status 1 = link */
-#define MII_SR_AUTONEG_CAPS  0x0008/* Auto Neg Capable */
-#define MII_SR_REMOTE_FAULT  0x0010/* Remote Fault Detect */
-#define MII_SR_AUTONEG_COMPLETE  0x0020/* Auto Neg Complete */
-#define MII_SR_PREAMBLE_SUPPRESS 0x0040/* Preamble may be suppressed */
-#define MII_SR_EXTENDED_STATUS   0x0100/* Ext. status info in Reg 0x0F */
-#define MII_SR_100T2_HD_CAPS 0x0200/* 100T2 Half Duplex Capable */
-#define MII_SR_100T2_FD_CAPS 0x0400/* 100T2 Full Duplex Capable */
-#define MII_SR_10T_HD_CAPS   0x0800/* 10T   Half Duplex Capable */
-#define MII_SR_10T_FD_CAPS   0x1000/* 10T   Full Duplex Capable */
-#define MII_SR_100X_HD_CAPS  0x2000/* 100X  Half Duplex Capable */
-#define MII_SR_100X_FD_CAPS  0x4000/* 100X  Full Duplex Capable */
-#define MII_SR_100T4_CAPS0x8000/* 100T4 Capable */
-
 /* eTSEC */
 
 /* Number of register in the device */
diff --git a/hw/net/fsl_etsec/miim.c b/hw/net/fsl_etsec/miim.c
index 6bba01c82a..b48d2cb57b 100644
--- a/hw/net/fsl_etsec/miim.c
+++ b/hw/net/fsl_etsec/miim.c
@@ -23,6 +23,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "hw/net/mii.h"
 #include "etsec.h"
 #include "registers.h"
 
@@ -140,8 +141,8 @@ void etsec_miim_link_status(eTSEC *etsec, NetClientState 
*nc)
 {
 /* Set link status */
 if (nc->link_down) {
-etsec->phy_status &= ~MII_SR_LINK_STATUS;
+etsec->phy_status &= ~MII_BMSR_LINK_ST;
 } else {
-etsec->phy_status |= MII_SR_LINK_STATUS;
+etsec->phy_status |= MII_BMSR_LINK_ST;
 }
 }
diff --git a/include/hw/net/mii.h b/include/hw/net/mii.h
index c6a767a49a..ed1bb52b0f 100644
--- a/include/hw/net/mii.h
+++ b/include/hw/net/mii.h
@@ -55,6 +55,7 @@
 #define MII_BMCR_CTST   (1 << 7)  /* Collision test */
 #define MII_BMCR_SPEED1000  (1 << 6)  /* MSB of Speed (1000) */
 
+#define MII_BMSR_100T4  (1 << 15) /* Can do 100mbps T4 */
 #define MII_BMSR_100TX_FD   (1 << 14) /* Can do 100mbps, full-duplex */
 #define MII_BMSR_100TX_HD   (1 << 13) /* Can do 100mbps, half-duplex */
 #define MII_BMSR_10T_FD (1 << 12) /* Can do 10mbps, full-duplex */
-- 
2.39.0




[PATCH 14/31] e1000e: Configure ResettableClass

2023-01-12 Thread Akihiko Odaki
This is part of recent efforts of refactoring e1000 and e1000e.

DeviceClass's reset member is deprecated so migrate to ResettableClass.
Thre is no behavioral difference.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/e1000e.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index 0bc222d354..40a4b97938 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -513,9 +513,9 @@ static void e1000e_pci_uninit(PCIDevice *pci_dev)
 msi_uninit(pci_dev);
 }
 
-static void e1000e_qdev_reset(DeviceState *dev)
+static void e1000e_qdev_reset(Object *obj)
 {
-E1000EState *s = E1000E(dev);
+E1000EState *s = E1000E(obj);
 
 trace_e1000e_cb_qdev_reset();
 
@@ -669,6 +669,7 @@ static Property e1000e_properties[] = {
 static void e1000e_class_init(ObjectClass *class, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(class);
+ResettableClass *rc = RESETTABLE_CLASS(class);
 PCIDeviceClass *c = PCI_DEVICE_CLASS(class);
 
 c->realize = e1000e_pci_realize;
@@ -679,8 +680,9 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
 c->romfile = "efi-e1000e.rom";
 c->class_id = PCI_CLASS_NETWORK_ETHERNET;
 
+rc->phases.hold = e1000e_qdev_reset;
+
 dc->desc = "Intel 82574L GbE Controller";
-dc->reset = e1000e_qdev_reset;
 dc->vmsd = &e1000e_vmstate;
 
 e1000e_prop_disable_vnet = qdev_prop_uint8;
-- 
2.39.0




[PATCH 11/31] e1000e: Remove pending interrupt flags

2023-01-12 Thread Akihiko Odaki
They are duplicate of running throttling timer flags and incomplete as
the flags are not cleared when the interrupts are fired or the device is
reset.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e.c  |  5 ++---
 hw/net/e1000e_core.c | 19 +++
 hw/net/e1000e_core.h |  2 --
 hw/net/trace-events  |  2 --
 4 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index d591d01c07..0bc222d354 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -631,12 +631,11 @@ static const VMStateDescription e1000e_vmstate = {
 VMSTATE_E1000E_INTR_DELAY_TIMER(core.tidv, E1000EState),
 
 VMSTATE_E1000E_INTR_DELAY_TIMER(core.itr, E1000EState),
-VMSTATE_BOOL(core.itr_intr_pending, E1000EState),
+VMSTATE_UNUSED(1),
 
 VMSTATE_E1000E_INTR_DELAY_TIMER_ARRAY(core.eitr, E1000EState,
   E1000E_MSIX_VEC_NUM),
-VMSTATE_BOOL_ARRAY(core.eitr_intr_pending, E1000EState,
-   E1000E_MSIX_VEC_NUM),
+VMSTATE_UNUSED(E1000E_MSIX_VEC_NUM),
 
 VMSTATE_UINT32(core.itr_guest_value, E1000EState),
 VMSTATE_UINT32_ARRAY(core.eitr_guest_value, E1000EState,
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 87f964cdc1..37aec6a970 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -154,11 +154,6 @@ e1000e_intrmgr_on_throttling_timer(void *opaque)
 
 timer->running = false;
 
-if (!timer->core->itr_intr_pending) {
-trace_e1000e_irq_throttling_no_pending_interrupts();
-return;
-}
-
 if (msi_enabled(timer->core->owner)) {
 trace_e1000e_irq_msi_notify_postponed();
 /* Clear msi_causes_pending to fire MSI eventually */
@@ -180,11 +175,6 @@ e1000e_intrmgr_on_msix_throttling_timer(void *opaque)
 
 timer->running = false;
 
-if (!timer->core->eitr_intr_pending[idx]) {
-trace_e1000e_irq_throttling_no_pending_vec(idx);
-return;
-}
-
 trace_e1000e_irq_msix_notify_postponed_vec(idx);
 msix_notify(timer->core->owner, idx);
 }
@@ -2015,13 +2005,11 @@ e1000e_clear_ims_bits(E1000ECore *core, uint32_t bits)
 }
 
 static inline bool
-e1000e_postpone_interrupt(bool *interrupt_pending,
-   E1000IntrDelayTimer *timer)
+e1000e_postpone_interrupt(E1000IntrDelayTimer *timer)
 {
 if (timer->running) {
 trace_e1000e_irq_postponed_by_xitr(timer->delay_reg << 2);
 
-*interrupt_pending = true;
 return true;
 }
 
@@ -2035,14 +2023,13 @@ e1000e_postpone_interrupt(bool *interrupt_pending,
 static inline bool
 e1000e_itr_should_postpone(E1000ECore *core)
 {
-return e1000e_postpone_interrupt(&core->itr_intr_pending, &core->itr);
+return e1000e_postpone_interrupt(&core->itr);
 }
 
 static inline bool
 e1000e_eitr_should_postpone(E1000ECore *core, int idx)
 {
-return e1000e_postpone_interrupt(&core->eitr_intr_pending[idx],
- &core->eitr[idx]);
+return e1000e_postpone_interrupt(&core->eitr[idx]);
 }
 
 static void
diff --git a/hw/net/e1000e_core.h b/hw/net/e1000e_core.h
index b8f38c47a0..d0a14b4523 100644
--- a/hw/net/e1000e_core.h
+++ b/hw/net/e1000e_core.h
@@ -95,10 +95,8 @@ struct E1000Core {
 E1000IntrDelayTimer tidv;
 
 E1000IntrDelayTimer itr;
-bool itr_intr_pending;
 
 E1000IntrDelayTimer eitr[E1000E_MSIX_VEC_NUM];
-bool eitr_intr_pending[E1000E_MSIX_VEC_NUM];
 
 VMChangeStateEntry *vmstate;
 
diff --git a/hw/net/trace-events b/hw/net/trace-events
index 4c0ec3fda1..8fa4299704 100644
--- a/hw/net/trace-events
+++ b/hw/net/trace-events
@@ -201,10 +201,8 @@ e1000e_rx_metadata_ipv6_filtering_disabled(void) "IPv6 RX 
filtering disabled by
 e1000e_vlan_vet(uint16_t vet) "Setting VLAN ethernet type 0x%X"
 
 e1000e_irq_msi_notify(uint32_t cause) "MSI notify 0x%x"
-e1000e_irq_throttling_no_pending_interrupts(void) "No pending interrupts to 
notify"
 e1000e_irq_msi_notify_postponed(void) "Sending MSI postponed by ITR"
 e1000e_irq_legacy_notify_postponed(void) "Raising legacy IRQ postponed by ITR"
-e1000e_irq_throttling_no_pending_vec(int idx) "No pending interrupts for 
vector %d"
 e1000e_irq_msix_notify_postponed_vec(int idx) "Sending MSI-X postponed by 
EITR[%d]"
 e1000e_irq_legacy_notify(bool level) "IRQ line state: %d"
 e1000e_irq_msix_notify_vec(uint32_t vector) "MSI-X notify vector 0x%x"
-- 
2.39.0




[PATCH 26/31] tests/qtest/e1000e-test: Fabricate ethernet header

2023-01-12 Thread Akihiko Odaki
e1000e understands ethernet header so fabricate something convincing.

Signed-off-by: Akihiko Odaki 
---
 tests/qtest/e1000e-test.c   | 17 +++--
 tests/qtest/libqos/e1000e.h |  2 ++
 2 files changed, 13 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
index b63a4d3c91..98706355e3 100644
--- a/tests/qtest/e1000e-test.c
+++ b/tests/qtest/e1000e-test.c
@@ -27,6 +27,7 @@
 #include "qemu/osdep.h"
 #include "libqtest-single.h"
 #include "libqos/pci-pc.h"
+#include "net/eth.h"
 #include "qemu/sockets.h"
 #include "qemu/iov.h"
 #include "qemu/module.h"
@@ -35,9 +36,13 @@
 #include "libqos/e1000e.h"
 #include "hw/net/e1000_regs.h"
 
+static const struct eth_header test = {
+.h_dest = E1000E_ADDRESS,
+.h_source = E1000E_ADDRESS,
+};
+
 static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator 
*alloc)
 {
-static const char test[] = "TEST";
 struct e1000_tx_desc descr;
 char buffer[64];
 int ret;
@@ -45,7 +50,7 @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, 
QGuestAllocator *a
 
 /* Prepare test data buffer */
 uint64_t data = guest_alloc(alloc, sizeof(buffer));
-memwrite(data, test, sizeof(test));
+memwrite(data, &test, sizeof(test));
 
 /* Prepare TX descriptor */
 memset(&descr, 0, sizeof(descr));
@@ -71,7 +76,7 @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, 
QGuestAllocator *a
 g_assert_cmpint(ret, == , sizeof(recv_len));
 ret = recv(test_sockets[0], buffer, sizeof(buffer), 0);
 g_assert_cmpint(ret, ==, sizeof(buffer));
-g_assert_cmpstr(buffer, == , test);
+g_assert_false(memcmp(buffer, &test, sizeof(test)));
 
 /* Free test data buffer */
 guest_free(alloc, data);
@@ -81,14 +86,14 @@ static void e1000e_receive_verify(QE1000E *d, int 
*test_sockets, QGuestAllocator
 {
 union e1000_rx_desc_extended descr;
 
-char test[] = "TEST";
+struct eth_header test_iov = test;
 int len = htonl(sizeof(test));
 struct iovec iov[] = {
 {
 .iov_base = &len,
 .iov_len = sizeof(len),
 },{
-.iov_base = test,
+.iov_base = &test_iov,
 .iov_len = sizeof(test),
 },
 };
@@ -119,7 +124,7 @@ static void e1000e_receive_verify(QE1000E *d, int 
*test_sockets, QGuestAllocator
 
 /* Check data sent to the backend */
 memread(data, buffer, sizeof(buffer));
-g_assert_cmpstr(buffer, == , test);
+g_assert_false(memcmp(buffer, &test, sizeof(test)));
 
 /* Free test data buffer */
 guest_free(alloc, data);
diff --git a/tests/qtest/libqos/e1000e.h b/tests/qtest/libqos/e1000e.h
index 091ce139da..5e2b201aa7 100644
--- a/tests/qtest/libqos/e1000e.h
+++ b/tests/qtest/libqos/e1000e.h
@@ -25,6 +25,8 @@
 #define E1000E_RX0_MSG_ID   (0)
 #define E1000E_TX0_MSG_ID   (1)
 
+#define E1000E_ADDRESS { 0x52, 0x54, 0x00, 0x12, 0x34, 0x56 }
+
 typedef struct QE1000E QE1000E;
 typedef struct QE1000E_PCI QE1000E_PCI;
 
-- 
2.39.0




[PATCH 16/31] e1000e: Set MII_ANER_NWAY

2023-01-12 Thread Akihiko Odaki
This keeps Windows driver 12.18.9.23 from generating an event with ID
30. The description of the event is as follows:
> Intel(R) 82574L Gigabit Network Connection
>  PROBLEM: The network adapter is configured for auto-negotiation but
> the link partner is not.  This may result in a duplex mismatch.
>  ACTION: Configure the link partner for auto-negotiation.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index d8c17baf8f..736708407c 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -3426,7 +3426,7 @@ 
e1000e_phy_reg_init[E1000E_PHY_PAGES][E1000E_PHY_PAGE_SIZE] = {
 [MII_ANLPAR]= MII_ANLPAR_10 | MII_ANLPAR_10FD |
   MII_ANLPAR_TX | MII_ANLPAR_TXFD |
   MII_ANLPAR_T4 | MII_ANLPAR_PAUSE,
-[MII_ANER]  = MII_ANER_NP,
+[MII_ANER]  = MII_ANER_NP | MII_ANER_NWAY,
 [MII_ANNP]  = 1 | MII_ANNP_MP,
 [MII_CTRL1000]  = MII_CTRL1000_HALF | MII_CTRL1000_FULL |
   MII_CTRL1000_PORT | MII_CTRL1000_MASTER,
-- 
2.39.0




[PATCH 17/31] tests/qtest/e1000e-test: Fix the code style

2023-01-12 Thread Akihiko Odaki
igb implementation first starts off by copying e1000e code. Correct the
code style before that.

Signed-off-by: Akihiko Odaki 
---
 tests/qtest/e1000e-test.c   | 2 +-
 tests/qtest/libqos/e1000e.c | 6 --
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/tests/qtest/e1000e-test.c b/tests/qtest/e1000e-test.c
index 3fc92046be..b63a4d3c91 100644
--- a/tests/qtest/e1000e-test.c
+++ b/tests/qtest/e1000e-test.c
@@ -1,4 +1,4 @@
- /*
+/*
  * QTest testcase for e1000e NIC
  *
  * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
diff --git a/tests/qtest/libqos/e1000e.c b/tests/qtest/libqos/e1000e.c
index 37c794b130..b90eb2d5e0 100644
--- a/tests/qtest/libqos/e1000e.c
+++ b/tests/qtest/libqos/e1000e.c
@@ -222,8 +222,10 @@ static void e1000e_register_nodes(void)
 .device_id = E1000_DEV_ID_82574L,
 };
 
-/* FIXME: every test using this node needs to setup a -netdev socket,id=hs0
- * otherwise QEMU is not going to start */
+/*
+ * FIXME: every test using this node needs to setup a -netdev socket,id=hs0
+ * otherwise QEMU is not going to start
+ */
 QOSGraphEdgeOptions opts = {
 .extra_device_opts = "netdev=hs0",
 };
-- 
2.39.0




[PATCH 23/31] igb: Rename identifiers

2023-01-12 Thread Akihiko Odaki
Rename identifiers of definitions which will be modified later for igb.
This will also allow to build igb along with e1000e.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
 hw/net/igb.c|  368 +-
 hw/net/igb_common.h |6 +-
 hw/net/igb_core.c   | 1694 +--
 hw/net/igb_core.h   |   74 +-
 4 files changed, 1062 insertions(+), 1080 deletions(-)

diff --git a/hw/net/igb.c b/hw/net/igb.c
index d61efb781e..5d4c904cc5 100644
--- a/hw/net/igb.c
+++ b/hw/net/igb.c
@@ -48,17 +48,17 @@
 #include "hw/qdev-properties.h"
 #include "migration/vmstate.h"
 
-#include "e1000_common.h"
-#include "e1000e_core.h"
+#include "igb_common.h"
+#include "igb_core.h"
 
 #include "trace.h"
 #include "qapi/error.h"
 #include "qom/object.h"
 
-#define TYPE_E1000E "e1000e"
-OBJECT_DECLARE_SIMPLE_TYPE(E1000EState, E1000E)
+#define TYPE_IGB "igb"
+OBJECT_DECLARE_SIMPLE_TYPE(IGBState, IGB)
 
-struct E1000EState {
+struct IGBState {
 PCIDevice parent_obj;
 NICState *nic;
 NICConf conf;
@@ -78,7 +78,7 @@ struct E1000EState {
 
 bool disable_vnet;
 
-E1000ECore core;
+IGBCore core;
 bool init_vet;
 };
 
@@ -96,22 +96,21 @@ struct E1000EState {
 #define E1000E_MSIX_PBA (0x2000)
 
 static uint64_t
-e1000e_mmio_read(void *opaque, hwaddr addr, unsigned size)
+igb_mmio_read(void *opaque, hwaddr addr, unsigned size)
 {
-E1000EState *s = opaque;
-return e1000e_core_read(&s->core, addr, size);
+IGBState *s = opaque;
+return igb_core_read(&s->core, addr, size);
 }
 
 static void
-e1000e_mmio_write(void *opaque, hwaddr addr,
-   uint64_t val, unsigned size)
+igb_mmio_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
-E1000EState *s = opaque;
-e1000e_core_write(&s->core, addr, val, size);
+IGBState *s = opaque;
+igb_core_write(&s->core, addr, val, size);
 }
 
 static bool
-e1000e_io_get_reg_index(E1000EState *s, uint32_t *idx)
+igb_io_get_reg_index(IGBState *s, uint32_t *idx)
 {
 if (s->ioaddr < 0x1) {
 *idx = s->ioaddr;
@@ -133,9 +132,9 @@ e1000e_io_get_reg_index(E1000EState *s, uint32_t *idx)
 }
 
 static uint64_t
-e1000e_io_read(void *opaque, hwaddr addr, unsigned size)
+igb_io_read(void *opaque, hwaddr addr, unsigned size)
 {
-E1000EState *s = opaque;
+IGBState *s = opaque;
 uint32_t idx = 0;
 uint64_t val;
 
@@ -144,8 +143,8 @@ e1000e_io_read(void *opaque, hwaddr addr, unsigned size)
 trace_e1000e_io_read_addr(s->ioaddr);
 return s->ioaddr;
 case E1000_IODATA:
-if (e1000e_io_get_reg_index(s, &idx)) {
-val = e1000e_core_read(&s->core, idx, sizeof(val));
+if (igb_io_get_reg_index(s, &idx)) {
+val = igb_core_read(&s->core, idx, sizeof(val));
 trace_e1000e_io_read_data(idx, val);
 return val;
 }
@@ -157,10 +156,9 @@ e1000e_io_read(void *opaque, hwaddr addr, unsigned size)
 }
 
 static void
-e1000e_io_write(void *opaque, hwaddr addr,
-uint64_t val, unsigned size)
+igb_io_write(void *opaque, hwaddr addr, uint64_t val, unsigned size)
 {
-E1000EState *s = opaque;
+IGBState *s = opaque;
 uint32_t idx = 0;
 
 switch (addr) {
@@ -169,9 +167,9 @@ e1000e_io_write(void *opaque, hwaddr addr,
 s->ioaddr = (uint32_t) val;
 return;
 case E1000_IODATA:
-if (e1000e_io_get_reg_index(s, &idx)) {
+if (igb_io_get_reg_index(s, &idx)) {
 trace_e1000e_io_write_data(idx, val);
-e1000e_core_write(&s->core, idx, val, sizeof(val));
+igb_core_write(&s->core, idx, val, sizeof(val));
 }
 return;
 default:
@@ -181,8 +179,8 @@ e1000e_io_write(void *opaque, hwaddr addr,
 }
 
 static const MemoryRegionOps mmio_ops = {
-.read = e1000e_mmio_read,
-.write = e1000e_mmio_write,
+.read = igb_mmio_read,
+.write = igb_mmio_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .impl = {
 .min_access_size = 4,
@@ -191,8 +189,8 @@ static const MemoryRegionOps mmio_ops = {
 };
 
 static const MemoryRegionOps io_ops = {
-.read = e1000e_io_read,
-.write = e1000e_io_write,
+.read = igb_io_read,
+.write = igb_io_write,
 .endianness = DEVICE_LITTLE_ENDIAN,
 .impl = {
 .min_access_size = 4,
@@ -201,47 +199,47 @@ static const MemoryRegionOps io_ops = {
 };
 
 static bool
-e1000e_nc_can_receive(NetClientState *nc)
+igb_nc_can_receive(NetClientState *nc)
 {
-E1000EState *s = qemu_get_nic_opaque(nc);
-return e1000e_can_receive(&s->core);
+IGBState *s = qemu_get_nic_opaque(nc);
+return igb_can_receive(&s->core);
 }
 
 static ssize_t
-e1000e_nc_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
+igb_nc_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
 {
-E1000EState *s = qemu_get_nic_opaque(nc);
-return e1000e_receive_iov(&s->core, iov, iovcnt);
+IGBSt

[PATCH 24/31] igb: Build igb

2023-01-12 Thread Akihiko Odaki
Currently igb functions identically with e1000e.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
 hw/net/Kconfig | 5 +
 hw/net/meson.build | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/hw/net/Kconfig b/hw/net/Kconfig
index 1cc1c5775e..18c7851efe 100644
--- a/hw/net/Kconfig
+++ b/hw/net/Kconfig
@@ -44,6 +44,11 @@ config E1000E_PCI_EXPRESS
 default y if PCI_DEVICES
 depends on PCI_EXPRESS && MSI_NONBROKEN
 
+config IGB_PCI_EXPRESS
+bool
+default y if PCI_DEVICES
+depends on PCI_EXPRESS && MSI_NONBROKEN
+
 config RTL8139_PCI
 bool
 default y if PCI_DEVICES
diff --git a/hw/net/meson.build b/hw/net/meson.build
index ebac261542..4974ad6bd2 100644
--- a/hw/net/meson.build
+++ b/hw/net/meson.build
@@ -10,6 +10,8 @@ softmmu_ss.add(when: 'CONFIG_PCNET_COMMON', if_true: 
files('pcnet.c'))
 softmmu_ss.add(when: 'CONFIG_E1000_PCI', if_true: files('e1000.c', 
'e1000x_common.c'))
 softmmu_ss.add(when: 'CONFIG_E1000E_PCI_EXPRESS', if_true: 
files('net_tx_pkt.c', 'net_rx_pkt.c'))
 softmmu_ss.add(when: 'CONFIG_E1000E_PCI_EXPRESS', if_true: files('e1000e.c', 
'e1000e_core.c', 'e1000x_common.c'))
+softmmu_ss.add(when: 'CONFIG_IGB_PCI_EXPRESS', if_true: files('net_tx_pkt.c', 
'net_rx_pkt.c'))
+softmmu_ss.add(when: 'CONFIG_IGB_PCI_EXPRESS', if_true: files('igb.c', 
'igb_core.c'))
 softmmu_ss.add(when: 'CONFIG_RTL8139_PCI', if_true: files('rtl8139.c'))
 softmmu_ss.add(when: 'CONFIG_TULIP', if_true: files('tulip.c'))
 softmmu_ss.add(when: 'CONFIG_VMXNET3_PCI', if_true: files('net_tx_pkt.c', 
'net_rx_pkt.c'))
-- 
2.39.0




[PATCH 30/31] tests/avocado: Add igb test

2023-01-12 Thread Akihiko Odaki
This automates ethtool tests for igb registers, interrupts, etc.

Signed-off-by: Akihiko Odaki 
---
 MAINTAINERS   |  1 +
 .../org.centos/stream/8/x86_64/test-avocado   |  1 +
 tests/avocado/igb.py  | 38 +++
 3 files changed, 40 insertions(+)
 create mode 100644 tests/avocado/igb.py

diff --git a/MAINTAINERS b/MAINTAINERS
index d4a3b4f6db..5301c1908f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2212,6 +2212,7 @@ igb
 M: Akihiko Odaki 
 S: Maintained
 F: hw/net/igb*
+F: tests/avocado/igb.py
 F: tests/qtest/igb-test.c
 F: tests/qtest/libqos/igb.c
 
diff --git a/scripts/ci/org.centos/stream/8/x86_64/test-avocado 
b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
index 7aeecbcfb8..7e07dbcc89 100755
--- a/scripts/ci/org.centos/stream/8/x86_64/test-avocado
+++ b/scripts/ci/org.centos/stream/8/x86_64/test-avocado
@@ -37,6 +37,7 @@ make get-vm-images
 tests/avocado/cpu_queries.py:QueryCPUModelExpansion.test \
 tests/avocado/empty_cpu_model.py:EmptyCPUModel.test \
 tests/avocado/hotplug_cpu.py:HotPlugCPU.test \
+tests/avocado/igb.py:IGB.test \
 tests/avocado/info_usernet.py:InfoUsernet.test_hostfwd \
 tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu \
 tests/avocado/intel_iommu.py:IntelIOMMU.test_intel_iommu_pt \
diff --git a/tests/avocado/igb.py b/tests/avocado/igb.py
new file mode 100644
index 00..abf5dfa07f
--- /dev/null
+++ b/tests/avocado/igb.py
@@ -0,0 +1,38 @@
+# SPDX-License-Identifier: GPL-2.0-or-later
+# ethtool tests for igb registers, interrupts, etc
+
+from avocado_qemu import LinuxTest
+
+class IGB(LinuxTest):
+"""
+:avocado: tags=accel:kvm
+:avocado: tags=arch:x86_64
+:avocado: tags=distro:fedora
+:avocado: tags=distro_version:31
+:avocado: tags=machine:q35
+"""
+
+timeout = 180
+
+def test(self):
+self.require_accelerator('kvm')
+kernel_url = self.distro.pxeboot_url + 'vmlinuz'
+kernel_hash = '5b6f6876e1b5bda314f93893271da0d5777b1f3c'
+kernel_path = self.fetch_asset(kernel_url, asset_hash=kernel_hash)
+initrd_url = self.distro.pxeboot_url + 'initrd.img'
+initrd_hash = 'dd0340a1b39bd28f88532babd4581c67649ec5b1'
+initrd_path = self.fetch_asset(initrd_url, asset_hash=initrd_hash)
+
+# Ideally we want to test MSI as well, but it is blocked by a bug
+# fixed with:
+# 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=28e96556baca7056d11d9fb3cdd0aba4483e00d8
+kernel_params = self.distro.default_kernel_params + ' pci=nomsi'
+
+self.vm.add_args('-kernel', kernel_path,
+ '-initrd', initrd_path,
+ '-append', kernel_params,
+ '-accel', 'kvm',
+ '-device', 'igb')
+self.launch_and_wait()
+self.ssh_command('dnf -y install ethtool')
+self.ssh_command('ethtool -t eth1 offline')
-- 
2.39.0




[PATCH 08/31] e1000e: Use more constant definitions

2023-01-12 Thread Akihiko Odaki
The definitions of SW Semaphore Register were copied from:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/intel/e1000e/defines.h?h=v6.0.9#n374

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000_regs.h  |  7 +++
 hw/net/e1000e_core.c | 49 
 2 files changed, 34 insertions(+), 22 deletions(-)

diff --git a/hw/net/e1000_regs.h b/hw/net/e1000_regs.h
index 3f6b5d0c52..6a36573802 100644
--- a/hw/net/e1000_regs.h
+++ b/hw/net/e1000_regs.h
@@ -525,6 +525,13 @@
 #define M88E1000_PHY_VCO_REG_BIT8  0x100 /* Bits 8 & 11 are adjusted for */
 #define M88E1000_PHY_VCO_REG_BIT11 0x800/* improved BER performance */
 
+/* SW Semaphore Register */
+#define E1000_SWSM_SMBI 0x0001 /* Driver Semaphore bit */
+#define E1000_SWSM_SWESMBI  0x0002 /* FW Semaphore bit */
+#define E1000_SWSM_DRV_LOAD 0x0008 /* Driver Loaded Bit */
+
+#define E1000_SWSM2_LOCK0x0002 /* Secondary driver semaphore bit */
+
 /* Interrupt Cause Read */
 #define E1000_ICR_TXDW  0x0001 /* Transmit desc written back */
 #define E1000_ICR_TXQE  0x0002 /* Transmit Queue empty */
diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index e6fc85ea51..6a4da72bd3 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -1022,10 +1022,11 @@ e1000e_receive_filter(E1000ECore *core, const uint8_t 
*buf, int size)
 
 if (e1000x_is_vlan_packet(buf, core->mac[VET]) &&
 e1000x_vlan_rx_filter_enabled(core->mac)) {
-uint16_t vid = lduw_be_p(buf + 14);
-uint32_t vfta = ldl_le_p((uint32_t *)(core->mac + VFTA) +
- ((vid >> 5) & 0x7f));
-if ((vfta & (1 << (vid & 0x1f))) == 0) {
+uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(buf)->h_tci);
+uint32_t vfta =
+ldl_le_p((uint32_t *)(core->mac + VFTA) +
+ ((vid >> E1000_VFTA_ENTRY_SHIFT) & 
E1000_VFTA_ENTRY_MASK));
+if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) {
 trace_e1000e_rx_flt_vlan_mismatch(vid);
 return false;
 } else {
@@ -1679,16 +1680,13 @@ e1000e_rx_fix_l4_csum(E1000ECore *core, struct NetRxPkt 
*pkt)
 }
 }
 
-/* Min. octets in an ethernet frame sans FCS */
-#define MIN_BUF_SIZE 60
-
 ssize_t
 e1000e_receive_iov(E1000ECore *core, const struct iovec *iov, int iovcnt)
 {
-static const int maximum_ethernet_hdr_len = (14 + 4);
+static const int maximum_ethernet_hdr_len = (ETH_HLEN + 4);
 
 uint32_t n = 0;
-uint8_t min_buf[MIN_BUF_SIZE];
+uint8_t min_buf[ETH_ZLEN];
 struct iovec min_iov;
 uint8_t *filter_buf;
 size_t size, orig_size;
@@ -2627,7 +2625,7 @@ static uint32_t
 e1000e_mac_swsm_read(E1000ECore *core, int index)
 {
 uint32_t val = core->mac[SWSM];
-core->mac[SWSM] = val | 1;
+core->mac[SWSM] = val | E1000_SWSM_SMBI;
 return val;
 }
 
@@ -3092,8 +3090,8 @@ static const readops e1000e_macreg_readops[] = {
 [IP4AT ... IP4AT + 6]  = e1000e_mac_readreg,
 [RA ... RA + 31]   = e1000e_mac_readreg,
 [WUPM ... WUPM + 31]   = e1000e_mac_readreg,
-[MTA ... MTA + 127]= e1000e_mac_readreg,
-[VFTA ... VFTA + 127]  = e1000e_mac_readreg,
+[MTA ... MTA + E1000_MC_TBL_SIZE - 1] = e1000e_mac_readreg,
+[VFTA ... VFTA + E1000_VLAN_FILTER_TBL_SIZE - 1]  = e1000e_mac_readreg,
 [FFMT ... FFMT + 254]  = e1000e_mac_readreg,
 [FFVT ... FFVT + 254]  = e1000e_mac_readreg,
 [MDEF ... MDEF + 7]= e1000e_mac_readreg,
@@ -3245,8 +3243,8 @@ static const writeops e1000e_macreg_writeops[] = {
 [IP4AT ... IP4AT + 6]= e1000e_mac_writereg,
 [RA + 2 ... RA + 31] = e1000e_mac_writereg,
 [WUPM ... WUPM + 31] = e1000e_mac_writereg,
-[MTA ... MTA + 127]  = e1000e_mac_writereg,
-[VFTA ... VFTA + 127]= e1000e_mac_writereg,
+[MTA ... MTA + E1000_MC_TBL_SIZE - 1] = e1000e_mac_writereg,
+[VFTA ... VFTA + E1000_VLAN_FILTER_TBL_SIZE - 1]= e1000e_mac_writereg,
 [FFMT ... FFMT + 254]= e1000e_set_4bit,
 [FFVT ... FFVT + 254]= e1000e_mac_writereg,
 [PBM ... PBM + 10239]= e1000e_mac_writereg,
@@ -3276,7 +3274,7 @@ static const uint16_t mac_reg_access[E1000E_MAC_SIZE] = {
 [TDH_A]   = 0x0cf8, [TDT_A]   = 0x0cf8, [TIDV_A] = 0x0cf8,
 [TDFH_A]  = 0xed00, [TDFT_A]  = 0xed00,
 [RA_A ... RA_A + 31]  = 0x14f0,
-[VFTA_A ... VFTA_A + 127] = 0x1400,
+[VFTA_A ... VFTA_A + E1000_VLAN_FILTER_TBL_SIZE - 1] = 0x1400,
 [RDBAL0_A ... RDLEN0_A] = 0x09bc,
 [TDBAL_A ... TDLEN_A]   = 0x0cf8,
 /* Access options */
@@ -3433,13 +3431,20 @@ 
e1000e_phy_reg_init[E1000E_PHY_PAGES][E1000E_PHY_PAGE_SIZE] = {
 
 [MII_PHYID1]= 0x141,
 [MII_PHYID2]= E1000_PHY_ID2_82574x,
-[MII_ANAR]  = 0xde1,
-[MII_ANLPAR]= 0x7e0,
-[MII_ANER]  = BIT(2),
-[MII_ANNP] 

[PATCH 18/31] tests/qtest/libqos/e1000e: Remove duplicate register definitions

2023-01-12 Thread Akihiko Odaki
The register definitions in tests/qtest/libqos/e1000e.h had names
different from hw/net/e1000_regs.h, which made it hard to understand
what test codes corresponds to the implementation. Use
hw/net/e1000_regs.h from tests/qtest/libqos/e1000e.c to remove
these duplications.

Signed-off-by: Akihiko Odaki 
---
 tests/qtest/libqos/e1000e.c | 20 ++--
 tests/qtest/libqos/e1000e.h |  5 -
 2 files changed, 10 insertions(+), 15 deletions(-)

diff --git a/tests/qtest/libqos/e1000e.c b/tests/qtest/libqos/e1000e.c
index b90eb2d5e0..28fb3052aa 100644
--- a/tests/qtest/libqos/e1000e.c
+++ b/tests/qtest/libqos/e1000e.c
@@ -51,13 +51,13 @@ static uint32_t e1000e_macreg_read(QE1000E *d, uint32_t reg)
 void e1000e_tx_ring_push(QE1000E *d, void *descr)
 {
 QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
-uint32_t tail = e1000e_macreg_read(d, E1000E_TDT);
-uint32_t len = e1000e_macreg_read(d, E1000E_TDLEN) / E1000_RING_DESC_LEN;
+uint32_t tail = e1000e_macreg_read(d, E1000_TDT);
+uint32_t len = e1000e_macreg_read(d, E1000_TDLEN) / E1000_RING_DESC_LEN;
 
 qtest_memwrite(d_pci->pci_dev.bus->qts,
d->tx_ring + tail * E1000_RING_DESC_LEN,
descr, E1000_RING_DESC_LEN);
-e1000e_macreg_write(d, E1000E_TDT, (tail + 1) % len);
+e1000e_macreg_write(d, E1000_TDT, (tail + 1) % len);
 
 /* Read WB data for the packet transmitted */
 qtest_memread(d_pci->pci_dev.bus->qts,
@@ -68,13 +68,13 @@ void e1000e_tx_ring_push(QE1000E *d, void *descr)
 void e1000e_rx_ring_push(QE1000E *d, void *descr)
 {
 QE1000E_PCI *d_pci = container_of(d, QE1000E_PCI, e1000e);
-uint32_t tail = e1000e_macreg_read(d, E1000E_RDT);
-uint32_t len = e1000e_macreg_read(d, E1000E_RDLEN) / E1000_RING_DESC_LEN;
+uint32_t tail = e1000e_macreg_read(d, E1000_RDT);
+uint32_t len = e1000e_macreg_read(d, E1000_RDLEN) / E1000_RING_DESC_LEN;
 
 qtest_memwrite(d_pci->pci_dev.bus->qts,
d->rx_ring + tail * E1000_RING_DESC_LEN,
descr, E1000_RING_DESC_LEN);
-e1000e_macreg_write(d, E1000E_RDT, (tail + 1) % len);
+e1000e_macreg_write(d, E1000_RDT, (tail + 1) % len);
 
 /* Read WB data for the packet received */
 qtest_memread(d_pci->pci_dev.bus->qts,
@@ -145,8 +145,8 @@ static void e1000e_pci_start_hw(QOSGraphObject *obj)
(uint32_t) d->e1000e.tx_ring);
 e1000e_macreg_write(&d->e1000e, E1000_TDBAH,
(uint32_t) (d->e1000e.tx_ring >> 32));
-e1000e_macreg_write(&d->e1000e, E1000E_TDLEN, E1000E_RING_LEN);
-e1000e_macreg_write(&d->e1000e, E1000E_TDT, 0);
+e1000e_macreg_write(&d->e1000e, E1000_TDLEN, E1000E_RING_LEN);
+e1000e_macreg_write(&d->e1000e, E1000_TDT, 0);
 e1000e_macreg_write(&d->e1000e, E1000_TDH, 0);
 
 /* Enable transmit */
@@ -156,8 +156,8 @@ static void e1000e_pci_start_hw(QOSGraphObject *obj)
(uint32_t)d->e1000e.rx_ring);
 e1000e_macreg_write(&d->e1000e, E1000_RDBAH,
(uint32_t)(d->e1000e.rx_ring >> 32));
-e1000e_macreg_write(&d->e1000e, E1000E_RDLEN, E1000E_RING_LEN);
-e1000e_macreg_write(&d->e1000e, E1000E_RDT, 0);
+e1000e_macreg_write(&d->e1000e, E1000_RDLEN, E1000E_RING_LEN);
+e1000e_macreg_write(&d->e1000e, E1000_RDT, 0);
 e1000e_macreg_write(&d->e1000e, E1000_RDH, 0);
 
 /* Enable receive */
diff --git a/tests/qtest/libqos/e1000e.h b/tests/qtest/libqos/e1000e.h
index 3bf285af42..091ce139da 100644
--- a/tests/qtest/libqos/e1000e.h
+++ b/tests/qtest/libqos/e1000e.h
@@ -25,11 +25,6 @@
 #define E1000E_RX0_MSG_ID   (0)
 #define E1000E_TX0_MSG_ID   (1)
 
-#define E1000E_TDLEN(0x3808)
-#define E1000E_TDT  (0x3818)
-#define E1000E_RDLEN(0x2808)
-#define E1000E_RDT  (0x2818)
-
 typedef struct QE1000E QE1000E;
 typedef struct QE1000E_PCI QE1000E_PCI;
 
-- 
2.39.0




[PATCH 06/31] e1000e: Mask registers when writing

2023-01-12 Thread Akihiko Odaki
When a register has effective bits fewer than their width, the old code
inconsistently masked when writing or reading. Make the code consistent
by always masking when writing, and remove some code duplication.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 94 +++-
 1 file changed, 40 insertions(+), 54 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 181c1e0c2a..e6fc85ea51 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -2440,17 +2440,19 @@ e1000e_set_fcrtl(E1000ECore *core, int index, uint32_t 
val)
 core->mac[FCRTL] = val & 0x8000FFF8;
 }
 
-static inline void
-e1000e_set_16bit(E1000ECore *core, int index, uint32_t val)
-{
-core->mac[index] = val & 0x;
-}
+#define E1000E_LOW_BITS_SET_FUNC(num)\
+static void  \
+e1000e_set_##num##bit(E1000ECore *core, int index, uint32_t val) \
+{\
+core->mac[index] = val & (BIT(num) - 1); \
+}
 
-static void
-e1000e_set_12bit(E1000ECore *core, int index, uint32_t val)
-{
-core->mac[index] = val & 0xfff;
-}
+E1000E_LOW_BITS_SET_FUNC(4)
+E1000E_LOW_BITS_SET_FUNC(6)
+E1000E_LOW_BITS_SET_FUNC(11)
+E1000E_LOW_BITS_SET_FUNC(12)
+E1000E_LOW_BITS_SET_FUNC(13)
+E1000E_LOW_BITS_SET_FUNC(16)
 
 static void
 e1000e_set_vet(E1000ECore *core, int index, uint32_t val)
@@ -2621,22 +2623,6 @@ e1000e_mac_ims_read(E1000ECore *core, int index)
 return core->mac[IMS];
 }
 
-#define E1000E_LOW_BITS_READ_FUNC(num)  \
-static uint32_t \
-e1000e_mac_low##num##_read(E1000ECore *core, int index) \
-{   \
-return core->mac[index] & (BIT(num) - 1);   \
-}   \
-
-#define E1000E_LOW_BITS_READ(num)   \
-e1000e_mac_low##num##_read
-
-E1000E_LOW_BITS_READ_FUNC(4);
-E1000E_LOW_BITS_READ_FUNC(6);
-E1000E_LOW_BITS_READ_FUNC(11);
-E1000E_LOW_BITS_READ_FUNC(13);
-E1000E_LOW_BITS_READ_FUNC(16);
-
 static uint32_t
 e1000e_mac_swsm_read(E1000ECore *core, int index)
 {
@@ -2930,7 +2916,19 @@ static const readops e1000e_macreg_readops[] = {
 e1000e_getreg(LATECOL),
 e1000e_getreg(SEQEC),
 e1000e_getreg(XONTXC),
+e1000e_getreg(AIT),
+e1000e_getreg(TDFH),
+e1000e_getreg(TDFT),
+e1000e_getreg(TDFHS),
+e1000e_getreg(TDFTS),
+e1000e_getreg(TDFPC),
 e1000e_getreg(WUS),
+e1000e_getreg(PBS),
+e1000e_getreg(RDFH),
+e1000e_getreg(RDFT),
+e1000e_getreg(RDFHS),
+e1000e_getreg(RDFTS),
+e1000e_getreg(RDFPC),
 e1000e_getreg(GORCL),
 e1000e_getreg(MGTPRC),
 e1000e_getreg(EERD),
@@ -3066,16 +3064,9 @@ static const readops e1000e_macreg_readops[] = {
 [MPTC]= e1000e_mac_read_clr4,
 [IAC] = e1000e_mac_read_clr4,
 [ICR] = e1000e_mac_icr_read,
-[RDFH]= E1000E_LOW_BITS_READ(13),
-[RDFHS]   = E1000E_LOW_BITS_READ(13),
-[RDFPC]   = E1000E_LOW_BITS_READ(13),
-[TDFH]= E1000E_LOW_BITS_READ(13),
-[TDFHS]   = E1000E_LOW_BITS_READ(13),
 [STATUS]  = e1000e_get_status,
 [TARC0]   = e1000e_get_tarc,
-[PBS] = E1000E_LOW_BITS_READ(6),
 [ICS] = e1000e_mac_ics_read,
-[AIT] = E1000E_LOW_BITS_READ(16),
 [TORH]= e1000e_mac_read_clr8,
 [GORCH]   = e1000e_mac_read_clr8,
 [PRC127]  = e1000e_mac_read_clr4,
@@ -3091,11 +3082,6 @@ static const readops e1000e_macreg_readops[] = {
 [BPTC]= e1000e_mac_read_clr4,
 [TSCTC]   = e1000e_mac_read_clr4,
 [ITR] = e1000e_mac_itr_read,
-[RDFT]= E1000E_LOW_BITS_READ(13),
-[RDFTS]   = E1000E_LOW_BITS_READ(13),
-[TDFPC]   = E1000E_LOW_BITS_READ(13),
-[TDFT]= E1000E_LOW_BITS_READ(13),
-[TDFTS]   = E1000E_LOW_BITS_READ(13),
 [CTRL]= e1000e_get_ctrl,
 [TARC1]   = e1000e_get_tarc,
 [SWSM]= e1000e_mac_swsm_read,
@@ -3108,10 +3094,10 @@ static const readops e1000e_macreg_readops[] = {
 [WUPM ... WUPM + 31]   = e1000e_mac_readreg,
 [MTA ... MTA + 127]= e1000e_mac_readreg,
 [VFTA ... VFTA + 127]  = e1000e_mac_readreg,
-[FFMT ... FFMT + 254]  = E1000E_LOW_BITS_READ(4),
+[FFMT ... FFMT + 254]  = e1000e_mac_readreg,
 [FFVT ... FFVT + 254]  = e1000e_mac_readreg,
 [MDEF ... MDEF + 7]= e1000e_mac_readreg,
-[FFLT ... FFLT + 10]   = E1000E_LOW_BITS_READ(11),
+[FFLT ... FFLT + 10]   = e1000e_mac_readreg,
 [FTFT ... FTFT + 254]  = e1000e_mac_readreg,
 [PBM ... PBM + 10239]  = e1000e_mac_readreg,
 [RETA ... RETA + 31]   = e1000e_mac_readreg,
@@ -3134,19 +3120,8 @@ static const writeops e1000e_macreg_writeops[] = {
 e1000e_putreg(LEDCTL),
 e1000e_putreg(FCAL),
 e1000e_putreg(FCRUC),
-e1000e_putreg(AIT),
-e1000

[PATCH 12/31] e1000e: Improve software reset

2023-01-12 Thread Akihiko Odaki
This change makes e1000e reset more things when software reset was
triggered. Some registers are exempted from software reset in the
datasheet and this change also implements the behavior accordingly.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 24 +++-
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 37aec6a970..b8670662c8 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -58,6 +58,8 @@
 static inline void
 e1000e_set_interrupt_cause(E1000ECore *core, uint32_t val);
 
+static void e1000e_reset(E1000ECore *core, bool sw);
+
 static inline void
 e1000e_process_ts_option(E1000ECore *core, struct e1000_tx_desc *dp)
 {
@@ -1882,7 +1884,7 @@ e1000e_set_ctrl(E1000ECore *core, int index, uint32_t val)
 
 if (val & E1000_CTRL_RST) {
 trace_e1000e_core_ctrl_sw_reset();
-e1000x_reset_mac_addr(core->owner_nic, core->mac, core->permanent_mac);
+e1000e_reset(core, true);
 }
 
 if (val & E1000_CTRL_PHY_RST) {
@@ -3488,8 +3490,7 @@ static const uint32_t e1000e_mac_reg_init[] = {
 [EITR...EITR + E1000E_MSIX_VEC_NUM - 1] = E1000E_MIN_XITR,
 };
 
-void
-e1000e_core_reset(E1000ECore *core)
+static void e1000e_reset(E1000ECore *core, bool sw)
 {
 int i;
 
@@ -3499,8 +3500,15 @@ e1000e_core_reset(E1000ECore *core)
 
 memset(core->phy, 0, sizeof core->phy);
 memcpy(core->phy, e1000e_phy_reg_init, sizeof e1000e_phy_reg_init);
-memset(core->mac, 0, sizeof core->mac);
-memcpy(core->mac, e1000e_mac_reg_init, sizeof e1000e_mac_reg_init);
+
+for (i = 0; i < E1000E_MAC_SIZE; i++) {
+if (sw && (i == PBA || i == PBS || i == FLA)) {
+continue;
+}
+
+core->mac[i] = i < ARRAY_SIZE(e1000e_mac_reg_init) ?
+   e1000e_mac_reg_init[i] : 0;
+}
 
 core->rxbuf_min_shift = 1 + E1000_RING_DESC_LEN_SHIFT;
 
@@ -3517,6 +3525,12 @@ e1000e_core_reset(E1000ECore *core)
 }
 }
 
+void
+e1000e_core_reset(E1000ECore *core)
+{
+e1000e_reset(core, false);
+}
+
 void e1000e_core_pre_save(E1000ECore *core)
 {
 int i;
-- 
2.39.0




[PATCH 00/31] Introduce igb

2023-01-12 Thread Akihiko Odaki
igb is a family of Intel's gigabit ethernet controllers. This series implements
82576 emulation in particular. You can see the last patch for the documentation.

Note that there is another effort to bring 82576 emulation. This series was
developed independently by Sriram Yagnaraman.
https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg04670.html

Patch 1 - 16 are general improvements for e1000 and e1000e.
Patch 17 - 18 are general improvements for e1000e test code.
Patch 19 - 21 makes necessary modifications to existing files.
Patch 22 starts off implementing igb emulation by copying e1000e code.
Patch 23 renames things so that it won't collide with e1000e.
Patch 24 makes building igb possible.
Patch 25 actually transforms e1000e emulation code into igb emulation.
Patch 26 - 27 makes modifications necessary for tests to existing files.
Patch 28 copies e1000e test code.
Patch 29 transforms e1000e test code into igb test code.
Patch 30 adds ethtool test automation.
Patch 31 adds the documentation.

The main reason why this series is so huge is that the early part of this series
includes general improvements for e1000e. They are placed before copying e1000e
code so we won't need to duplicate those changes for both of e1000e and igb code
later. As their utility do not depend on the igb implementation, they can be
merged earlier if necessary.

It is also possible to merge the work from Sriram Yagnaraman earlier than
patch 18+ and to cherry-pick useful changes from those patches later.

I think there are several different ways to get the changes into the mainline.
I'm open to any options.

Also be aware that most of e1000e patches are already sent to the mailing list.
The below are links to Patchew:
03: https://patchew.org/QEMU/20221103060103.83363-1-akihiko.od...@daynix.com/
04: https://patchew.org/QEMU/20221125135254.54760-1-akihiko.od...@daynix.com/
05: https://patchew.org/QEMU/20221119054913.103803-1-akihiko.od...@daynix.com/
06: https://patchew.org/QEMU/20221119055304.105500-1-akihiko.od...@daynix.com/
08 includes: 
https://patchew.org/QEMU/20221119060156.110010-1-akihiko.od...@daynix.com/
10: https://patchew.org/QEMU/20221125140105.55925-1-akihiko.od...@daynix.com/
11: https://patchew.org/QEMU/20221125142608.58919-1-akihiko.od...@daynix.com/
13: https://patchew.org/QEMU/20221201095351.63392-1-akihiko.od...@daynix.com/
14: https://patchew.org/QEMU/20221201100113.64387-1-akihiko.od...@daynix.com/
15: https://patchew.org/QEMU/20230107143328.102534-1-akihiko.od...@daynix.com/

Akihiko Odaki (31):
  e1000e: Fix the code style
  hw/net: Add more MII definitions
  fsl_etsec: Use hw/net/mii.h
  e1000: Use hw/net/mii.h
  e1000: Mask registers when writing
  e1000e: Mask registers when writing
  e1000: Use more constant definitions
  e1000e: Use more constant definitions
  e1000: Use memcpy to intialize registers
  e1000e: Use memcpy to intialize registers
  e1000e: Remove pending interrupt flags
  e1000e: Improve software reset
  e1000: Configure ResettableClass
  e1000e: Configure ResettableClass
  e1000e: Introduce e1000_rx_desc_union
  e1000e: Set MII_ANER_NWAY
  tests/qtest/e1000e-test: Fix the code style
  tests/qtest/libqos/e1000e: Remove duplicate register definitions
  hw/net/net_tx_pkt: Introduce net_tx_pkt_get_eth_hdr
  pcie: Introduce pcie_sriov_num_vfs
  e1000: Split header files
  igb: Copy e1000e code
  igb: Rename identifiers
  igb: Build igb
  igb: Transform to 82576 implementation
  tests/qtest/e1000e-test: Fabricate ethernet header
  tests/qtest/libqos/e1000e: Export macreg functions
  tests/qtest/libqos/igb: Copy e1000e code
  tests/qtest/libqos/igb: Transform to igb tests
  tests/avocado: Add igb test
  docs/system/devices/igb: Add igb documentation

 MAINTAINERS   |9 +
 docs/system/device-emulation.rst  |1 +
 docs/system/devices/igb.rst   |   70 +
 hw/net/Kconfig|5 +
 hw/net/e1000.c|  250 +-
 hw/net/e1000_common.h |  104 +
 hw/net/e1000_regs.h   |  958 +---
 hw/net/e1000e.c   |   90 +-
 hw/net/e1000e_core.c  |  491 +-
 hw/net/e1000e_core.h  |   68 +-
 hw/net/e1000x_common.c|   12 +-
 hw/net/e1000x_common.h|  128 +-
 hw/net/e1000x_regs.h  |  940 
 hw/net/fsl_etsec/etsec.c  |   11 +-
 hw/net/fsl_etsec/etsec.h  |   17 -
 hw/net/fsl_etsec/miim.c   |5 +-
 hw/net/igb.c  |  594 +++
 hw/net/igb_common.h   |  146 +
 hw/net/igb_core.c | 3934 +
 hw/net/igb_core.h |  146 +
 hw/net/igb_regs.h |  624 +++
 hw/net/igbvf.c|  327 +

[PATCH 20/31] pcie: Introduce pcie_sriov_num_vfs

2023-01-12 Thread Akihiko Odaki
igb can use this function to change its behavior depending on the
number of virtual functions currently enabled.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
 hw/pci/pcie_sriov.c | 5 +
 include/hw/pci/pcie_sriov.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/hw/pci/pcie_sriov.c b/hw/pci/pcie_sriov.c
index 8e3faf1f59..5796654b98 100644
--- a/hw/pci/pcie_sriov.c
+++ b/hw/pci/pcie_sriov.c
@@ -300,3 +300,8 @@ PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n)
 }
 return NULL;
 }
+
+uint16_t pcie_sriov_num_vfs(PCIDevice *dev)
+{
+return dev->exp.sriov_pf.num_vfs;
+}
diff --git a/include/hw/pci/pcie_sriov.h b/include/hw/pci/pcie_sriov.h
index 80f5c84e75..072a583405 100644
--- a/include/hw/pci/pcie_sriov.h
+++ b/include/hw/pci/pcie_sriov.h
@@ -74,4 +74,7 @@ PCIDevice *pcie_sriov_get_pf(PCIDevice *dev);
  */
 PCIDevice *pcie_sriov_get_vf_at_index(PCIDevice *dev, int n);
 
+/* Returns the current number of virtual functions. */
+uint16_t pcie_sriov_num_vfs(PCIDevice *dev);
+
 #endif /* QEMU_PCIE_SRIOV_H */
-- 
2.39.0




[PATCH 28/31] tests/qtest/libqos/igb: Copy e1000e code

2023-01-12 Thread Akihiko Odaki
Start off igb test implementation by copying e1000e code first as igb
resembles e1000e.

Signed-off-by: Akihiko Odaki 
---
 MAINTAINERS  |   2 +
 tests/qtest/igb-test.c   | 242 +++
 tests/qtest/libqos/igb.c | 226 
 3 files changed, 470 insertions(+)
 create mode 100644 tests/qtest/igb-test.c
 create mode 100644 tests/qtest/libqos/igb.c

diff --git a/MAINTAINERS b/MAINTAINERS
index a3b2f67f66..d4a3b4f6db 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2212,6 +2212,8 @@ igb
 M: Akihiko Odaki 
 S: Maintained
 F: hw/net/igb*
+F: tests/qtest/igb-test.c
+F: tests/qtest/libqos/igb.c
 
 eepro100
 M: Stefan Weil 
diff --git a/tests/qtest/igb-test.c b/tests/qtest/igb-test.c
new file mode 100644
index 00..98706355e3
--- /dev/null
+++ b/tests/qtest/igb-test.c
@@ -0,0 +1,242 @@
+/*
+ * QTest testcase for e1000e NIC
+ *
+ * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Leonid Bloch 
+ * Yan Vugenfirer 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+
+#include "qemu/osdep.h"
+#include "libqtest-single.h"
+#include "libqos/pci-pc.h"
+#include "net/eth.h"
+#include "qemu/sockets.h"
+#include "qemu/iov.h"
+#include "qemu/module.h"
+#include "qemu/bitops.h"
+#include "libqos/libqos-malloc.h"
+#include "libqos/e1000e.h"
+#include "hw/net/e1000_regs.h"
+
+static const struct eth_header test = {
+.h_dest = E1000E_ADDRESS,
+.h_source = E1000E_ADDRESS,
+};
+
+static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator 
*alloc)
+{
+struct e1000_tx_desc descr;
+char buffer[64];
+int ret;
+uint32_t recv_len;
+
+/* Prepare test data buffer */
+uint64_t data = guest_alloc(alloc, sizeof(buffer));
+memwrite(data, &test, sizeof(test));
+
+/* Prepare TX descriptor */
+memset(&descr, 0, sizeof(descr));
+descr.buffer_addr = cpu_to_le64(data);
+descr.lower.data = cpu_to_le32(E1000_TXD_CMD_RS   |
+   E1000_TXD_CMD_EOP  |
+   E1000_TXD_CMD_DEXT |
+   E1000_TXD_DTYP_D   |
+   sizeof(buffer));
+
+/* Put descriptor to the ring */
+e1000e_tx_ring_push(d, &descr);
+
+/* Wait for TX WB interrupt */
+e1000e_wait_isr(d, E1000E_TX0_MSG_ID);
+
+/* Check DD bit */
+g_assert_cmphex(le32_to_cpu(descr.upper.data) & E1000_TXD_STAT_DD, ==,
+E1000_TXD_STAT_DD);
+
+/* Check data sent to the backend */
+ret = recv(test_sockets[0], &recv_len, sizeof(recv_len), 0);
+g_assert_cmpint(ret, == , sizeof(recv_len));
+ret = recv(test_sockets[0], buffer, sizeof(buffer), 0);
+g_assert_cmpint(ret, ==, sizeof(buffer));
+g_assert_false(memcmp(buffer, &test, sizeof(test)));
+
+/* Free test data buffer */
+guest_free(alloc, data);
+}
+
+static void e1000e_receive_verify(QE1000E *d, int *test_sockets, 
QGuestAllocator *alloc)
+{
+union e1000_rx_desc_extended descr;
+
+struct eth_header test_iov = test;
+int len = htonl(sizeof(test));
+struct iovec iov[] = {
+{
+.iov_base = &len,
+.iov_len = sizeof(len),
+},{
+.iov_base = &test_iov,
+.iov_len = sizeof(test),
+},
+};
+
+char buffer[64];
+int ret;
+
+/* Send a dummy packet to device's socket*/
+ret = iov_send(test_sockets[0], iov, 2, 0, sizeof(len) + sizeof(test));
+g_assert_cmpint(ret, == , sizeof(test) + sizeof(len));
+
+/* Prepare test data buffer */
+uint64_t data = guest_alloc(alloc, sizeof(buffer));
+
+/* Prepare RX descriptor */
+memset(&descr, 0, sizeof(descr));
+descr.read.buffer_addr = cpu_to_le64(data);
+
+/* Put descriptor to the ring */
+e1000e_rx_ring_push(d, &descr);
+
+/* Wait for TX WB interrupt */
+e1000e_wait_isr(d, E1000E_RX0_MSG_ID);
+
+/* Check DD bit */
+g_assert_cmphex(le32_to_cpu(descr.wb.upper.status_error) &
+E1000_RXD_STAT_DD, ==, E1000_RXD_STAT_DD);
+
+/* Check data sent to the backend */
+memread(data, buffer, sizeof(buffer));
+g_assert_false(memcmp(buffer, &test, sizeof(test)));
+
+/* Free test data buffer */
+ 

[PATCH 10/31] e1000e: Use memcpy to intialize registers

2023-01-12 Thread Akihiko Odaki
Use memcpy instead of memmove to initialize registers. The initial
register templates and register table instances will never overlap.

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000e_core.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/net/e1000e_core.c b/hw/net/e1000e_core.c
index 6a4da72bd3..87f964cdc1 100644
--- a/hw/net/e1000e_core.c
+++ b/hw/net/e1000e_core.c
@@ -3511,9 +3511,9 @@ e1000e_core_reset(E1000ECore *core)
 e1000e_intrmgr_reset(core);
 
 memset(core->phy, 0, sizeof core->phy);
-memmove(core->phy, e1000e_phy_reg_init, sizeof e1000e_phy_reg_init);
+memcpy(core->phy, e1000e_phy_reg_init, sizeof e1000e_phy_reg_init);
 memset(core->mac, 0, sizeof core->mac);
-memmove(core->mac, e1000e_mac_reg_init, sizeof e1000e_mac_reg_init);
+memcpy(core->mac, e1000e_mac_reg_init, sizeof e1000e_mac_reg_init);
 
 core->rxbuf_min_shift = 1 + E1000_RING_DESC_LEN_SHIFT;
 
-- 
2.39.0




[PATCH v2 2/7] target/arm/sme: Rebuild hflags in set_pstate() helpers

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/sme_helper.c| 2 ++
 target/arm/translate-a64.c | 1 -
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index f891306bb9..b5aefa3eda 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -45,6 +45,7 @@ void helper_set_pstate_sm(CPUARMState *env, uint32_t i)
 }
 env->svcr ^= R_SVCR_SM_MASK;
 arm_reset_sve_state(env);
+arm_rebuild_hflags(env);
 }
 
 void helper_set_pstate_za(CPUARMState *env, uint32_t i)
@@ -65,6 +66,7 @@ void helper_set_pstate_za(CPUARMState *env, uint32_t i)
 if (i) {
 memset(env->zarray, 0, sizeof(env->zarray));
 }
+arm_rebuild_hflags(env);
 }
 
 void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 35cc851246..035e63bdc5 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1855,7 +1855,6 @@ static void handle_msr_i(DisasContext *s, uint32_t insn,
 if ((crm & 4) && i != s->pstate_za) {
 gen_helper_set_pstate_za(cpu_env, tcg_constant_i32(i));
 }
-gen_rebuild_hflags(s);
 } else {
 s->base.is_jmp = DISAS_NEXT;
 }
-- 
2.38.1




[PATCH 13/31] e1000: Configure ResettableClass

2023-01-12 Thread Akihiko Odaki
This is part of recent efforts of refactoring e1000 and e1000e.

DeviceClass's reset member is deprecated so migrate to ResettableClass.
Thre is no behavioral difference.

Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
---
 hw/net/e1000.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 1bcc0cd4f3..a66cb39c8b 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -1731,9 +1731,9 @@ static void pci_e1000_realize(PCIDevice *pci_dev, Error 
**errp)
 e1000_flush_queue_timer, d);
 }
 
-static void qdev_e1000_reset(DeviceState *dev)
+static void qdev_e1000_reset(Object *obj)
 {
-E1000State *d = E1000(dev);
+E1000State *d = E1000(obj);
 e1000_reset(d);
 }
 
@@ -1762,6 +1762,7 @@ typedef struct E1000Info {
 static void e1000_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
+ResettableClass *rc = RESETTABLE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 E1000BaseClass *e = E1000_CLASS(klass);
 const E1000Info *info = data;
@@ -1774,9 +1775,9 @@ static void e1000_class_init(ObjectClass *klass, void 
*data)
 k->revision = info->revision;
 e->phy_id2 = info->phy_id2;
 k->class_id = PCI_CLASS_NETWORK_ETHERNET;
+rc->phases.hold = qdev_e1000_reset;
 set_bit(DEVICE_CATEGORY_NETWORK, dc->categories);
 dc->desc = "Intel Gigabit Ethernet";
-dc->reset = qdev_e1000_reset;
 dc->vmsd = &vmstate_e1000;
 device_class_set_props(dc, e1000_properties);
 }
-- 
2.39.0




[PATCH v2 7/7] target/arm/sme: Unify set_pstate() SM/ZA helpers as set_svcr()

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Unify the two helper_set_pstate_{sm,za} in this function.
Do not call helper_* functions from svcr_write.

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/helper-sme.h|  3 +--
 target/arm/helper.c|  2 --
 target/arm/sme_helper.c|  9 ++---
 target/arm/translate-a64.c | 10 ++
 4 files changed, 5 insertions(+), 19 deletions(-)

diff --git a/target/arm/helper-sme.h b/target/arm/helper-sme.h
index d2d544a696..27eef49a11 100644
--- a/target/arm/helper-sme.h
+++ b/target/arm/helper-sme.h
@@ -17,8 +17,7 @@
  * License along with this library; if not, see .
  */
 
-DEF_HELPER_FLAGS_2(set_pstate_sm, TCG_CALL_NO_RWG, void, env, i32)
-DEF_HELPER_FLAGS_2(set_pstate_za, TCG_CALL_NO_RWG, void, env, i32)
+DEF_HELPER_FLAGS_3(set_svcr, TCG_CALL_NO_RWG, void, env, i32, i32)
 
 DEF_HELPER_FLAGS_3(sme_zero, TCG_CALL_NO_RWG, void, env, i32, i32)
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index cf77bdd378..1d74b95971 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6762,8 +6762,6 @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, 
uint64_t mask)
 static void svcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
-helper_set_pstate_sm(env, FIELD_EX64(value, SVCR, SM));
-helper_set_pstate_za(env, FIELD_EX64(value, SVCR, ZA));
 aarch64_set_svcr(env, value, -1);
 }
 
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 3abe03e4cb..1e67fcac30 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -29,14 +29,9 @@
 #include "vec_internal.h"
 #include "sve_ldst_internal.h"
 
-void helper_set_pstate_sm(CPUARMState *env, uint32_t i)
+void helper_set_svcr(CPUARMState *env, uint32_t val, uint32_t mask)
 {
-aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
-}
-
-void helper_set_pstate_za(CPUARMState *env, uint32_t i)
-{
-aarch64_set_svcr(env, 0, R_SVCR_ZA_MASK);
+aarch64_set_svcr(env, val, mask);
 }
 
 void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl)
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 035e63bdc5..19cf371c4c 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1847,14 +1847,8 @@ static void handle_msr_i(DisasContext *s, uint32_t insn,
 
 if ((old ^ new) & msk) {
 /* At least one bit changes. */
-bool i = crm & 1;
-
-if ((crm & 2) && i != s->pstate_sm) {
-gen_helper_set_pstate_sm(cpu_env, tcg_constant_i32(i));
-}
-if ((crm & 4) && i != s->pstate_za) {
-gen_helper_set_pstate_za(cpu_env, tcg_constant_i32(i));
-}
+gen_helper_set_svcr(cpu_env, tcg_constant_i32(new),
+tcg_constant_i32(msk));
 } else {
 s->base.is_jmp = DISAS_NEXT;
 }
-- 
2.38.1




Re: [PATCH v4 4/4] scripts: add script to compare compatible properties

2023-01-12 Thread Maksim Davydov



On 12/12/22 13:53, Dr. David Alan Gilbert wrote:

* Maksim Davydov (davydov-...@yandex-team.ru) wrote:

This script run QEMU to obtain compat_props of machines and default
values of different types and produce appropriate table. This table
can be used to compare machine types to choose the most suitable
machine. Also this table in json or csv format should be used to check that
new machine doesn't affect previous ones by comparing tables with and
without new machine.
Default values of properties are needed to fill "holes" in the table (one
machine has these properties and another not. For instance, 2.12 mt has
`{ "EPYC-" TYPE_X86_CPU, "xlevel", "0x800a" }`, but compat_pros of
3.1 mt doesn't have it. So, to compare these machines we need to fill
unknown value of "EPYC-x86_64-cpu-xlevel" for 3.1 mt. This unknown value
in the table I called "hole". To get values (default values) for these
"holes" the script uses list of appropriate methods.)

Notes:
* some init values from the devices can't be available like properties
   from virtio-9p when configure has --disable-virtfs. This situations will
   be seen in the table as "unavailable driver".
* Default values can be obtained in an unobvious way, like x86 features.
   If the script doesn't know how to get property default value to compare
   one machine with another it fills "holes" with "unavailable method". This
   is done because script uses whitelist model to get default values of
   different types. It means that the method that can't be applied to a new
   type that can crash this script. It is better to get an "unavailable
   driver" when creating a new machine with new compatible properties than
   to break this script. So it turns out a more stable and generic script.
* If the default value can't be obtained because this property doesn't
   exist or because this property can't have default value, appropriate
   "hole" will be filled by "unknown property" or "no default value"
* If the property is applied to the abstract class, the script collects
   default values from all child classes (set of default values)

Nice;  just a suggestion - have you considered adding a flag to specify
the qemu binaries separately, so that you can compare the machine type
definitions of two qemu binaries for the same machine type?

Dave
Yes, it's a good idea. I'll try to add functionality like this in the 
future, but
in this patch series I want to focus on QAPI and general features (that 
can help to
compare machines and choose the appropriate one). After that I'll 
implement mt checks.
It seems that somewhere in this place there may be a need to compare 2 
different

qemu binaries

Thanks for reviewing,
Maksim Davydov

Example: ./scripts/compare_mt.py --mt pc-q35-3.1 pc-q35-4.0

╒═══╤══╤╕
│   │  pc-q35-3.1  │
 pc-q35-4.0 │
╞═══╪══╪╡
│ Cascadelake-Server-x86_64-cpu:mpx │ True │
   False│
├───┼──┼┤
│ Cascadelake-Server-x86_64-cpu:stepping│  5   │
 6  │
├───┼──┼┤
│ Icelake-Client-x86_64-cpu:mpx │ True │ 
unavailable driver │
├───┼──┼┤
│ Icelake-Server-x86_64-cpu:mpx │ True │
   False│
├───┼──┼┤
│ Opteron_G3-x86_64-cpu:rdtscp  │False │
True│
├───┼──┼┤
│ Opteron_G4-x86_64-cpu:rdtscp  │False │
True│
├───┼──┼┤
│ Opteron_G5-x86_64-cpu:rdtscp  │False │
True│
├───┼──┼┤
│ Skylake-Client-IBRS-x86_64-cpu:mpx│ True │
   False│
├───┼──┼┤
│ Skylake-Client-x86_64-cpu:mpx │ True │
   False│
├───┼──┼┤
│ Skylake-Server-IBRS-x86_64-cpu:mpx│ True │
   False│
├───┼

[PATCH 29/31] tests/qtest/libqos/igb: Transform to igb tests

2023-01-12 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 tests/qtest/fuzz/generic_fuzz_configs.h |   5 +
 tests/qtest/igb-test.c  |  67 ++--
 tests/qtest/libqos/igb.c| 139 +---
 tests/qtest/libqos/meson.build  |   1 +
 tests/qtest/meson.build |   1 +
 5 files changed, 90 insertions(+), 123 deletions(-)

diff --git a/tests/qtest/fuzz/generic_fuzz_configs.h 
b/tests/qtest/fuzz/generic_fuzz_configs.h
index a825b78c14..50689da653 100644
--- a/tests/qtest/fuzz/generic_fuzz_configs.h
+++ b/tests/qtest/fuzz/generic_fuzz_configs.h
@@ -90,6 +90,11 @@ const generic_fuzz_config predefined_configs[] = {
 .args = "-M q35 -nodefaults "
 "-device e1000e,netdev=net0 -netdev user,id=net0",
 .objects = "e1000e",
+},{
+.name = "igb",
+.args = "-M q35 -nodefaults "
+"-device igb,netdev=net0 -netdev user,id=net0",
+.objects = "igb",
 },{
 .name = "cirrus-vga",
 .args = "-machine q35 -nodefaults -device cirrus-vga",
diff --git a/tests/qtest/igb-test.c b/tests/qtest/igb-test.c
index 98706355e3..17d408f02a 100644
--- a/tests/qtest/igb-test.c
+++ b/tests/qtest/igb-test.c
@@ -1,10 +1,12 @@
 /*
- * QTest testcase for e1000e NIC
+ * QTest testcase for igb NIC
  *
+ * Copyright (c) 2022-2023 Red Hat, Inc.
  * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
  * Developed by Daynix Computing LTD (http://www.daynix.com)
  *
  * Authors:
+ * Akihiko Odaki 
  * Dmitry Fleytman 
  * Leonid Bloch 
  * Yan Vugenfirer 
@@ -34,16 +36,16 @@
 #include "qemu/bitops.h"
 #include "libqos/libqos-malloc.h"
 #include "libqos/e1000e.h"
-#include "hw/net/e1000_regs.h"
+#include "hw/net/igb_regs.h"
 
 static const struct eth_header test = {
 .h_dest = E1000E_ADDRESS,
 .h_source = E1000E_ADDRESS,
 };
 
-static void e1000e_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator 
*alloc)
+static void igb_send_verify(QE1000E *d, int *test_sockets, QGuestAllocator 
*alloc)
 {
-struct e1000_tx_desc descr;
+union e1000_adv_tx_desc descr;
 char buffer[64];
 int ret;
 uint32_t recv_len;
@@ -54,12 +56,11 @@ static void e1000e_send_verify(QE1000E *d, int 
*test_sockets, QGuestAllocator *a
 
 /* Prepare TX descriptor */
 memset(&descr, 0, sizeof(descr));
-descr.buffer_addr = cpu_to_le64(data);
-descr.lower.data = cpu_to_le32(E1000_TXD_CMD_RS   |
-   E1000_TXD_CMD_EOP  |
-   E1000_TXD_CMD_DEXT |
-   E1000_TXD_DTYP_D   |
-   sizeof(buffer));
+descr.read.buffer_addr = cpu_to_le64(data);
+descr.read.cmd_type_len = cpu_to_le32(E1000_TXD_CMD_RS   |
+  E1000_TXD_CMD_EOP  |
+  E1000_TXD_DTYP_D   |
+  sizeof(buffer));
 
 /* Put descriptor to the ring */
 e1000e_tx_ring_push(d, &descr);
@@ -68,7 +69,7 @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, 
QGuestAllocator *a
 e1000e_wait_isr(d, E1000E_TX0_MSG_ID);
 
 /* Check DD bit */
-g_assert_cmphex(le32_to_cpu(descr.upper.data) & E1000_TXD_STAT_DD, ==,
+g_assert_cmphex(le32_to_cpu(descr.wb.status) & E1000_TXD_STAT_DD, ==,
 E1000_TXD_STAT_DD);
 
 /* Check data sent to the backend */
@@ -82,9 +83,9 @@ static void e1000e_send_verify(QE1000E *d, int *test_sockets, 
QGuestAllocator *a
 guest_free(alloc, data);
 }
 
-static void e1000e_receive_verify(QE1000E *d, int *test_sockets, 
QGuestAllocator *alloc)
+static void igb_receive_verify(QE1000E *d, int *test_sockets, QGuestAllocator 
*alloc)
 {
-union e1000_rx_desc_extended descr;
+union e1000_adv_rx_desc descr;
 
 struct eth_header test_iov = test;
 int len = htonl(sizeof(test));
@@ -110,7 +111,7 @@ static void e1000e_receive_verify(QE1000E *d, int 
*test_sockets, QGuestAllocator
 
 /* Prepare RX descriptor */
 memset(&descr, 0, sizeof(descr));
-descr.read.buffer_addr = cpu_to_le64(data);
+descr.read.pkt_addr = cpu_to_le64(data);
 
 /* Put descriptor to the ring */
 e1000e_rx_ring_push(d, &descr);
@@ -135,7 +136,7 @@ static void test_e1000e_init(void *obj, void *data, 
QGuestAllocator * alloc)
 /* init does nothing */
 }
 
-static void test_e1000e_tx(void *obj, void *data, QGuestAllocator * alloc)
+static void test_igb_tx(void *obj, void *data, QGuestAllocator * alloc)
 {
 QE1000E_PCI *e1000e = obj;
 QE1000E *d = &e1000e->e1000e;
@@ -147,10 +148,10 @@ static void test_e1000e_tx(void *obj, void *data, 
QGuestAllocator * alloc)
 return;
 }
 
-e1000e_send_verify(d, data, alloc);
+igb_send_verify(d, data, alloc);
 }
 
-static void test_e1000e_rx(void *obj, void *data, QGuestAllocator * alloc)
+static void test_igb_rx(void *obj, void *data, QGuestAllocator * alloc)
 {
 QE1000E_PCI *e1000e = o

[PATCH v2 0/7] target/arm: Introduce aarch64_set_svcr

2023-01-12 Thread Philippe Mathieu-Daudé
This is a respin of Richard's patch
https://lore.kernel.org/qemu-devel/20230112004322.161330-1-richard.hender...@linaro.org/
but split in multiple trivial changes, as I was having hard
time to understand all changes at once while reviewing it.

Richard Henderson (7):
  target/arm/sme: Reorg SME access handling in handle_msr_i()
  target/arm/sme: Rebuild hflags in set_pstate() helpers
  target/arm/sme: Introduce aarch64_set_svcr()
  target/arm/sme: Reset SVE state in aarch64_set_svcr()
  target/arm/sme: Reset ZA state in aarch64_set_svcr()
  target/arm/sme: Rebuild hflags in aarch64_set_svcr()
  target/arm/sme: Unify set_pstate() SM/ZA helpers as set_svcr()

 linux-user/aarch64/cpu_loop.c | 11 ++
 linux-user/aarch64/signal.c   | 13 ++-
 target/arm/cpu.h  |  2 +-
 target/arm/helper-sme.h   |  3 +--
 target/arm/helper.c   | 41 ---
 target/arm/sme_helper.c   | 37 ++-
 target/arm/translate-a64.c| 19 ++--
 7 files changed, 53 insertions(+), 73 deletions(-)

-- 
2.38.1




[PATCH v2 7/8] qemu/uuid: Add UUID static initializer

2023-01-12 Thread Jonathan Cameron via
From: Ira Weiny 

UUID's are defined as network byte order fields.  No static initializer
was available for UUID's in their standard big endian format.

Define a big endian initializer for UUIDs.

Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
 include/qemu/uuid.h | 12 
 1 file changed, 12 insertions(+)

diff --git a/include/qemu/uuid.h b/include/qemu/uuid.h
index 9925febfa5..dc40ee1fc9 100644
--- a/include/qemu/uuid.h
+++ b/include/qemu/uuid.h
@@ -61,6 +61,18 @@ typedef struct {
 (clock_seq_hi_and_reserved), (clock_seq_low), (node0), (node1), (node2),\
 (node3), (node4), (node5) }
 
+/* Normal (network byte order) UUID */
+#define UUID(time_low, time_mid, time_hi_and_version,\
+  clock_seq_hi_and_reserved, clock_seq_low, node0, node1, node2, \
+  node3, node4, node5)   \
+  { ((time_low) >> 24) & 0xff, ((time_low) >> 16) & 0xff,\
+((time_low) >> 8) & 0xff, (time_low) & 0xff, \
+((time_mid) >> 8) & 0xff, (time_mid) & 0xff, \
+((time_hi_and_version) >> 8) & 0xff, (time_hi_and_version) & 0xff,   \
+(clock_seq_hi_and_reserved), (clock_seq_low),\
+(node0), (node1), (node2), (node3), (node4), (node5) \
+  }
+
 #define UUID_FMT "%02hhx%02hhx%02hhx%02hhx-" \
  "%02hhx%02hhx-%02hhx%02hhx-" \
  "%02hhx%02hhx-" \
-- 
2.37.2




[PATCH v2 5/7] target/arm/sme: Reset ZA state in aarch64_set_svcr()

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/helper.c | 12 
 target/arm/sme_helper.c | 12 
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index b655dde27d..26c3bb4cdf 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6740,6 +6740,18 @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, 
uint64_t mask)
 if (change & R_SVCR_SM_MASK) {
 arm_reset_sve_state(env);
 }
+
+/*
+ * ResetSMEState.
+ *
+ * SetPSTATE_ZA zeros on enable and disable.  We can zero this only
+ * on enable: while disabled, the storage is inaccessible and the
+ * value does not matter.  We're not saving the storage in vmstate
+ * when disabled either.
+ */
+if (change & new & R_SVCR_ZA_MASK) {
+memset(env->zarray, 0, sizeof(env->zarray));
+}
 }
 
 static void svcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index f73bf4d285..e146c17ba1 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -44,18 +44,6 @@ void helper_set_pstate_za(CPUARMState *env, uint32_t i)
 return;
 }
 aarch64_set_svcr(env, 0, R_SVCR_ZA_MASK);
-
-/*
- * ResetSMEState.
- *
- * SetPSTATE_ZA zeros on enable and disable.  We can zero this only
- * on enable: while disabled, the storage is inaccessible and the
- * value does not matter.  We're not saving the storage in vmstate
- * when disabled either.
- */
-if (i) {
-memset(env->zarray, 0, sizeof(env->zarray));
-}
 arm_rebuild_hflags(env);
 }
 
-- 
2.38.1




Re: [PATCH 04/31] e1000: Use hw/net/mii.h

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

hw/net/mii.h provides common definitions for MII.

Signed-off-by: Akihiko Odaki 
---
  hw/net/e1000.c | 86 ++--
  hw/net/e1000_regs.h| 46 
  hw/net/e1000e.c|  1 +
  hw/net/e1000e_core.c   | 99 +-
  hw/net/e1000x_common.c |  5 ++-
  hw/net/e1000x_common.h |  8 ++--
  6 files changed, 101 insertions(+), 144 deletions(-)



  static const char phy_regcap[0x20] = {
-[PHY_STATUS]  = PHY_R, [M88E1000_EXT_PHY_SPEC_CTRL] = PHY_RW,
-[PHY_ID1] = PHY_R, [M88E1000_PHY_SPEC_CTRL] = PHY_RW,
-[PHY_CTRL]= PHY_RW,[PHY_1000T_CTRL] = PHY_RW,
-[PHY_LP_ABILITY]  = PHY_R, [PHY_1000T_STATUS]   = PHY_R,
-[PHY_AUTONEG_ADV] = PHY_RW,[M88E1000_RX_ERR_CNTR]   = PHY_R,
-[PHY_ID2] = PHY_R, [M88E1000_PHY_SPEC_STATUS]   = PHY_R,
-[PHY_AUTONEG_EXP] = PHY_R,
+[MII_BMSR] = PHY_R,   [M88E1000_EXT_PHY_SPEC_CTRL] = PHY_RW,


Align off, otherwise:
Reviewed-by: Philippe Mathieu-Daudé 


+[MII_PHYID1] = PHY_R, [M88E1000_PHY_SPEC_CTRL] = PHY_RW,
+[MII_BMCR]   = PHY_RW,[MII_CTRL1000]   = PHY_RW,
+[MII_ANLPAR] = PHY_R, [MII_STAT1000]   = PHY_R,
+[MII_ANAR]   = PHY_RW,[M88E1000_RX_ERR_CNTR]   = PHY_R,
+[MII_PHYID2] = PHY_R, [M88E1000_PHY_SPEC_STATUS]   = PHY_R,
+[MII_ANER]   = PHY_R,
  };







Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:54, Daniel P. Berrangé wrote:

On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote:

On 11/1/23 23:30, Stefan Berger wrote:

To prevent getting stuck on waitpid() in case the target process does
not terminate on SIGTERM, poll on waitpid() for 10s and if the target
process has not changed state until then send a SIGKILL to it.

Signed-off-by: Stefan Berger 
---
   tests/qtest/libqtest.c | 18 +-
   1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 2fbc3b88f3..362b1f724f 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
   {
   #ifndef _WIN32
   pid_t pid;
+uint64_t end;
+
+/* poll for 10s until sending SIGKILL */
+end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;


Maybe we could use getenv() to allow tuning / using different value?


I'd rather we picked a value large enough that it will work
reliably out of the box for all scenarios with no magic
env required. We're just trying to prevent infinite waits if
something unexpected happens. We don't need to use an
aggressively short value, as most users will never hit this
scenario. I think 30 seconds is large enough to be reliable
but we could easily go higher to 60/120 if we want to be
really really sure.


I read your other comment later and I agree with you.




[PATCH v2 3/7] target/arm/sme: Introduce aarch64_set_svcr()

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 linux-user/aarch64/cpu_loop.c | 2 +-
 linux-user/aarch64/signal.c   | 2 +-
 target/arm/cpu.h  | 1 +
 target/arm/helper.c   | 8 
 target/arm/sme_helper.c   | 4 ++--
 5 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
index 9875d609a9..d53742e10b 100644
--- a/linux-user/aarch64/cpu_loop.c
+++ b/linux-user/aarch64/cpu_loop.c
@@ -93,8 +93,8 @@ void cpu_loop(CPUARMState *env)
  * On syscall, PSTATE.ZA is preserved, along with the ZA matrix.
  * PSTATE.SM is cleared, per SMSTOP, which does ResetSVEState.
  */
+aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
 if (FIELD_EX64(env->svcr, SVCR, SM)) {
-env->svcr = FIELD_DP64(env->svcr, SVCR, SM, 0);
 arm_rebuild_hflags(env);
 arm_reset_sve_state(env);
 }
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index 6a2c6e06d2..b6e4dcb494 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -669,11 +669,11 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
  * Invoke the signal handler with both SM and ZA disabled.
  * When clearing SM, ResetSVEState, per SMSTOP.
  */
+aarch64_set_svcr(env, 0, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
 if (FIELD_EX64(env->svcr, SVCR, SM)) {
 arm_reset_sve_state(env);
 }
 if (env->svcr) {
-env->svcr = 0;
 arm_rebuild_hflags(env);
 }
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index bf2bce046d..0484da3322 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1123,6 +1123,7 @@ int aarch64_cpu_gdb_write_register(CPUState *cpu, uint8_t 
*buf, int reg);
 void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
int new_el, bool el0_a64);
+void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask);
 void arm_reset_sve_state(CPUARMState *env);
 
 /*
diff --git a/target/arm/helper.c b/target/arm/helper.c
index cee3804354..b5626627a1 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6722,11 +6722,19 @@ static CPAccessResult access_esm(CPUARMState *env, 
const ARMCPRegInfo *ri,
 return CP_ACCESS_OK;
 }
 
+void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask)
+{
+uint64_t change = (env->svcr ^ new) & mask;
+
+env->svcr ^= change;
+}
+
 static void svcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
 helper_set_pstate_sm(env, FIELD_EX64(value, SVCR, SM));
 helper_set_pstate_za(env, FIELD_EX64(value, SVCR, ZA));
+aarch64_set_svcr(env, value, -1);
 arm_rebuild_hflags(env);
 }
 
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index b5aefa3eda..94dc084135 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -43,7 +43,7 @@ void helper_set_pstate_sm(CPUARMState *env, uint32_t i)
 if (i == FIELD_EX64(env->svcr, SVCR, SM)) {
 return;
 }
-env->svcr ^= R_SVCR_SM_MASK;
+aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
 arm_reset_sve_state(env);
 arm_rebuild_hflags(env);
 }
@@ -53,7 +53,7 @@ void helper_set_pstate_za(CPUARMState *env, uint32_t i)
 if (i == FIELD_EX64(env->svcr, SVCR, ZA)) {
 return;
 }
-env->svcr ^= R_SVCR_ZA_MASK;
+aarch64_set_svcr(env, 0, R_SVCR_ZA_MASK);
 
 /*
  * ResetSMEState.
-- 
2.38.1




[PATCH v2 0/8] hw/cxl: CXL emulation cleanups and minor fixes for upstream

2023-01-12 Thread Jonathan Cameron via
V2:
- Various minor issues found by Philippe, see individual patches.
  Note that the const_le64() patch matches changes in a set of Philippe's that 
was
  never applied. Philippe may send an update of that series before this merges.
  If that occurs, drop "qemu/bswap: Add const_le64()"
- Picked up tags.

V1 Cover letter.

A small collection of misc fixes and tidying up pulled out from various
series. I've pulled this to the top of my queue of CXL related work
as they stand fine on their own and it will reduce the noise in
the larger patch sets if these go upstream first.

Gregory's patches were posted as part of his work on adding volatile support.
https://lore.kernel.org/linux-cxl/20221006233702.18532-1-gregory.pr...@memverge.com/
https://lore.kernel.org/linux-cxl/20221128150157.97724-2-gregory.pr...@memverge.com/
I might propose this for upstream inclusion this cycle, but testing is
currently limited by lack of suitable kernel support.

Ira's patches were part of his event injection series.
https://lore.kernel.org/linux-cxl/20221221-ira-cxl-events-2022-11-17-v2-0-2ce2ecc06...@intel.com/
Intent is to propose for upstream the rest of that series shortly after
some minor changes from earlier review.

My three patches have not previously been posted.

For the curious, the current state of QEMU CXL emulation that we are working
through the backlog wrt to final cleanup before proposing for upstreaming can 
be found at.

https://gitlab.com/jic23/qemu/-/commits/cxl-2023-01-11

Gregory Price (2):
  hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL
  hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition

Ira Weiny (3):
  qemu/bswap: Add const_le64()
  qemu/uuid: Add UUID static initializer
  hw/cxl/mailbox: Use new UUID network order define for cel_uuid

Jonathan Cameron (3):
  hw/mem/cxl_type3: Improve error handling in realize()
  hw/pci-bridge/cxl_downstream: Fix type naming mismatch
  hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge

 hw/cxl/cxl-device-utils.c  |  2 +-
 hw/cxl/cxl-mailbox-utils.c | 28 +++-
 hw/i386/acpi-build.c   |  1 -
 hw/mem/cxl_type3.c | 15 +++
 hw/pci-bridge/cxl_downstream.c |  2 +-
 include/hw/cxl/cxl_device.h|  2 +-
 include/qemu/bswap.h   | 12 +++-
 include/qemu/uuid.h| 12 
 8 files changed, 52 insertions(+), 22 deletions(-)

-- 
2.37.2




[PATCH 31/31] docs/system/devices/igb: Add igb documentation

2023-01-12 Thread Akihiko Odaki
Signed-off-by: Akihiko Odaki 
---
 MAINTAINERS  |  1 +
 docs/system/device-emulation.rst |  1 +
 docs/system/devices/igb.rst  | 70 
 3 files changed, 72 insertions(+)
 create mode 100644 docs/system/devices/igb.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 5301c1908f..9bca2adf40 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2211,6 +2211,7 @@ F: tests/qtest/fuzz-e1000e-test.c
 igb
 M: Akihiko Odaki 
 S: Maintained
+F: docs/system/devices/igb.rst
 F: hw/net/igb*
 F: tests/avocado/igb.py
 F: tests/qtest/igb-test.c
diff --git a/docs/system/device-emulation.rst b/docs/system/device-emulation.rst
index 0506006056..c1b1934e3d 100644
--- a/docs/system/device-emulation.rst
+++ b/docs/system/device-emulation.rst
@@ -93,3 +93,4 @@ Emulated Devices
devices/virtio-pmem.rst
devices/vhost-user-rng.rst
devices/canokey.rst
+   devices/igb.rst
diff --git a/docs/system/devices/igb.rst b/docs/system/devices/igb.rst
new file mode 100644
index 00..1a77c82ed8
--- /dev/null
+++ b/docs/system/devices/igb.rst
@@ -0,0 +1,70 @@
+.. SPDX-License-Identifier: GPL-2.0-or-later
+.. _igb:
+
+igb
+---
+
+igb is a family of Intel's gigabit ethernet controllers. In QEMU, 82576
+emulation is implemented in particular. Its datasheet is available at [1]_.
+
+This implementation is expected to be useful to test SR-IOV networking without
+requiring physical hardware.
+
+Limitations
+===
+
+This igb implementation was tested with Linux Test Project [2]_ during the
+initial development. The command used when testing is:
+
+.. code-block:: shell
+
+  network.sh -6mta
+
+Be aware that this implementation lacks many functionalities available with the
+actual hardware, and you may experience various failures if you try to use it
+with a different operating system other than Linux or if you try 
functionalities
+not covered by the tests.
+
+Using igb
+=
+
+Using igb should be nothing different from using another network device. See
+:ref:`pcsys_005fnetwork` in general.
+
+However, you may also need to perform additional steps to activate SR-IOV
+feature on your guest. For Linux, refer to [3]_.
+
+Developing igb
+==
+
+igb is the successor of e1000e, and e1000e is the successor of e1000 in turn.
+As these devices are very similar, if you make a change for igb and the same
+change can be applied to e1000e and e1000, please do so.
+
+Please do not forget to run tests before submitting a change. As tests included
+in QEMU is very minimal, run some application which is likely to be affected by
+the change to confirm it works in an integrated system.
+
+Testing igb
+===
+
+A qtest of the basic functionality is available. Run the below at the build
+directory:
+
+.. code-block:: shell
+
+  meson test qtest-x86_64/qos-test
+
+ethtool can test register accesses, interrupts, etc. It is automated as an
+Avocado test and can be ran with the following command:
+
+.. code:: shell
+
+  make check-avocado AVOCADO_TESTS=tests/avocado/igb.py
+
+References
+==
+
+.. [1] 
https://www.intel.com/content/dam/www/public/us/en/documents/datasheets/82576eb-gigabit-ethernet-controller-datasheet.pdf
+.. [2] https://github.com/linux-test-project/ltp
+.. [3] https://docs.kernel.org/PCI/pci-iov-howto.html
-- 
2.39.0




Re: [PATCH] target/arm: Introduce aarch64_set_svcr

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 01:43, Richard Henderson wrote:

Unify the two helper_set_pstate_{sm,za} in this function.
Do not call helper_* functions from svcr_write.
Cleans up linux-user usage by consolodating logic.

Cc: Fabiano Rosas 
Signed-off-by: Richard Henderson 
---

Fabiano, I expect this to replace much of your

   [RFC PATCH v2 07/19] target/arm: Move helper_set_pstate_* into cpregs.c

r~
---
  target/arm/cpu.h  |  2 +-
  target/arm/helper-sme.h   |  3 +--
  linux-user/aarch64/cpu_loop.c | 11 ++
  linux-user/aarch64/signal.c   | 13 ++-
  target/arm/helper.c   | 41 ---
  target/arm/sme_helper.c   | 37 ++-
  target/arm/translate-a64.c| 19 ++--
  7 files changed, 53 insertions(+), 73 deletions(-)


Since this patch was a bit too hard to digest at once, I split it
in trivial steps here:
https://lore.kernel.org/qemu-devel/20230112102436.1913-1-phi...@linaro.org/

For whichever version you prefer:
Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v2 5/8] hw/i386/acpi: Drop duplicate _UID entry for CXL root bridge

2023-01-12 Thread Jonathan Cameron via
Noticed as this prevents iASL disasembling the DSDT table.

Reviewed-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
 hw/i386/acpi-build.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 127c4e2d50..a584b62ae2 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1482,7 +1482,6 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
 aml_append(pkg, aml_eisaid("PNP0A03"));
 aml_append(dev, aml_name_decl("_CID", pkg));
 aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
-aml_append(dev, aml_name_decl("_UID", aml_int(bus_num)));
 build_cxl_osc_method(dev);
 } else if (pci_bus_is_express(bus)) {
 aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
-- 
2.37.2




[PATCH 19/31] hw/net/net_tx_pkt: Introduce net_tx_pkt_get_eth_hdr

2023-01-12 Thread Akihiko Odaki
Expose the ethernet header so that igb can utilize it to perform the
internal routing among its SR-IOV functions.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
 hw/net/net_tx_pkt.c | 6 ++
 hw/net/net_tx_pkt.h | 8 
 2 files changed, 14 insertions(+)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 1cb1125d9f..4bffa1523d 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -278,6 +278,12 @@ bool net_tx_pkt_parse(struct NetTxPkt *pkt)
 }
 }
 
+struct eth_header *net_tx_pkt_get_eth_hdr(struct NetTxPkt *pkt)
+{
+assert(pkt);
+return (struct eth_header *)&pkt->l2_hdr;
+}
+
 struct virtio_net_hdr *net_tx_pkt_get_vhdr(struct NetTxPkt *pkt)
 {
 assert(pkt);
diff --git a/hw/net/net_tx_pkt.h b/hw/net/net_tx_pkt.h
index 4ec8bbe9bd..4e70453c12 100644
--- a/hw/net/net_tx_pkt.h
+++ b/hw/net/net_tx_pkt.h
@@ -44,6 +44,14 @@ void net_tx_pkt_init(struct NetTxPkt **pkt, PCIDevice 
*pci_dev,
  */
 void net_tx_pkt_uninit(struct NetTxPkt *pkt);
 
+/**
+ * get ethernet header
+ *
+ * @pkt:packet
+ * @ret:ethernet header
+ */
+struct eth_header *net_tx_pkt_get_eth_hdr(struct NetTxPkt *pkt);
+
 /**
  * get virtio header
  *
-- 
2.39.0




[PATCH v2 3/8] hw/cxl: set cxl-type3 device type to PCI_CLASS_MEMORY_CXL

2023-01-12 Thread Jonathan Cameron via
From: Gregory Price 

Current code sets to STORAGE_EXPRESS and then overrides it.

Reviewed-by: Davidlohr Bueso 
Reviewed-by: Ira Weiny 
Signed-off-by: Gregory Price 
Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index 252822bd82..217a5e639b 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -408,7 +408,6 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 }
 
 pci_config_set_prog_interface(pci_conf, 0x10);
-pci_config_set_class(pci_conf, PCI_CLASS_MEMORY_CXL);
 
 pcie_endpoint_cap_init(pci_dev, 0x80);
 if (ct3d->sn != UI64_NULL) {
@@ -627,7 +626,7 @@ static void ct3_class_init(ObjectClass *oc, void *data)
 
 pc->realize = ct3_realize;
 pc->exit = ct3_exit;
-pc->class_id = PCI_CLASS_STORAGE_EXPRESS;
+pc->class_id = PCI_CLASS_MEMORY_CXL;
 pc->vendor_id = PCI_VENDOR_ID_INTEL;
 pc->device_id = 0xd93; /* LVF for now */
 pc->revision = 1;
-- 
2.37.2




[PATCH v2 4/7] target/arm/sme: Reset SVE state in aarch64_set_svcr()

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Move arm_reset_sve_state() calls to aarch64_set_svcr().

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 linux-user/aarch64/cpu_loop.c |  1 -
 linux-user/aarch64/signal.c   |  8 +---
 target/arm/cpu.h  |  1 -
 target/arm/helper.c   | 13 +
 target/arm/sme_helper.c   | 10 --
 5 files changed, 14 insertions(+), 19 deletions(-)

diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
index d53742e10b..5e93d27d8f 100644
--- a/linux-user/aarch64/cpu_loop.c
+++ b/linux-user/aarch64/cpu_loop.c
@@ -96,7 +96,6 @@ void cpu_loop(CPUARMState *env)
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
 if (FIELD_EX64(env->svcr, SVCR, SM)) {
 arm_rebuild_hflags(env);
-arm_reset_sve_state(env);
 }
 ret = do_syscall(env,
  env->xregs[8],
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index b6e4dcb494..a326a6def5 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -665,14 +665,8 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 env->btype = 2;
 }
 
-/*
- * Invoke the signal handler with both SM and ZA disabled.
- * When clearing SM, ResetSVEState, per SMSTOP.
- */
+/* Invoke the signal handler with both SM and ZA disabled. */
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
-if (FIELD_EX64(env->svcr, SVCR, SM)) {
-arm_reset_sve_state(env);
-}
 if (env->svcr) {
 arm_rebuild_hflags(env);
 }
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 0484da3322..a471add499 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1124,7 +1124,6 @@ void aarch64_sve_narrow_vq(CPUARMState *env, unsigned vq);
 void aarch64_sve_change_el(CPUARMState *env, int old_el,
int new_el, bool el0_a64);
 void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask);
-void arm_reset_sve_state(CPUARMState *env);
 
 /*
  * SVE registers are encoded in KVM's memory in an endianness-invariant format.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index b5626627a1..b655dde27d 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6722,11 +6722,24 @@ static CPAccessResult access_esm(CPUARMState *env, 
const ARMCPRegInfo *ri,
 return CP_ACCESS_OK;
 }
 
+/* ResetSVEState */
+static void arm_reset_sve_state(CPUARMState *env)
+{
+memset(env->vfp.zregs, 0, sizeof(env->vfp.zregs));
+/* Recall that FFR is stored as pregs[16]. */
+memset(env->vfp.pregs, 0, sizeof(env->vfp.pregs));
+vfp_set_fpcr(env, 0x089f);
+}
+
 void aarch64_set_svcr(CPUARMState *env, uint64_t new, uint64_t mask)
 {
 uint64_t change = (env->svcr ^ new) & mask;
 
 env->svcr ^= change;
+
+if (change & R_SVCR_SM_MASK) {
+arm_reset_sve_state(env);
+}
 }
 
 static void svcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index 94dc084135..f73bf4d285 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -29,22 +29,12 @@
 #include "vec_internal.h"
 #include "sve_ldst_internal.h"
 
-/* ResetSVEState */
-void arm_reset_sve_state(CPUARMState *env)
-{
-memset(env->vfp.zregs, 0, sizeof(env->vfp.zregs));
-/* Recall that FFR is stored as pregs[16]. */
-memset(env->vfp.pregs, 0, sizeof(env->vfp.pregs));
-vfp_set_fpcr(env, 0x089f);
-}
-
 void helper_set_pstate_sm(CPUARMState *env, uint32_t i)
 {
 if (i == FIELD_EX64(env->svcr, SVCR, SM)) {
 return;
 }
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
-arm_reset_sve_state(env);
 arm_rebuild_hflags(env);
 }
 
-- 
2.38.1




Re: [PATCH 07/31] e1000: Use more constant definitions

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

The definitions for E1000_VFTA_ENTRY_SHIFT, E1000_VFTA_ENTRY_MASK, and
E1000_VFTA_ENTRY_BIT_SHIFT_MASK were copied from:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/intel/e1000/e1000_hw.h?h=v6.0.9#n306

The definitions for E1000_NUM_UNICAST, E1000_MC_TBL_SIZE, and
E1000_VLAN_FILTER_TBL_SIZE were copied from:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/intel/e1000/e1000_hw.h?h=v6.0.9#n707

Signed-off-by: Akihiko Odaki 
---
  hw/net/e1000.c | 50 +++---
  hw/net/e1000_regs.h|  9 
  hw/net/e1000x_common.c |  5 +++--
  hw/net/e1000x_common.h |  2 +-
  4 files changed, 41 insertions(+), 25 deletions(-)




diff --git a/hw/net/e1000x_common.h b/hw/net/e1000x_common.h
index 3501e4855a..b991d814b1 100644
--- a/hw/net/e1000x_common.h
+++ b/hw/net/e1000x_common.h
@@ -102,7 +102,7 @@ enum {
  static inline void
  e1000x_inc_reg_if_not_full(uint32_t *mac, int index)
  {
-if (mac[index] != 0x) {
+if (mac[index] != UINT32_MAX) {


I wonder if using -1 wouldn't be simpler, otherwise great
cleanup!

Reviewed-by: Philippe Mathieu-Daudé 


  mac[index]++;
  }
  }





Re: [PATCH 03/31] fsl_etsec: Use hw/net/mii.h

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

hw/net/mii.h provides common definitions for MII.

Signed-off-by: Akihiko Odaki 
---
  hw/net/fsl_etsec/etsec.c | 11 ++-
  hw/net/fsl_etsec/etsec.h | 17 -
  hw/net/fsl_etsec/miim.c  |  5 +++--
  include/hw/net/mii.h |  1 +
  4 files changed, 10 insertions(+), 24 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 18/31] tests/qtest/libqos/e1000e: Remove duplicate register definitions

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

The register definitions in tests/qtest/libqos/e1000e.h had names
different from hw/net/e1000_regs.h, which made it hard to understand
what test codes corresponds to the implementation. Use
hw/net/e1000_regs.h from tests/qtest/libqos/e1000e.c to remove
these duplications.

Signed-off-by: Akihiko Odaki 
---
  tests/qtest/libqos/e1000e.c | 20 ++--
  tests/qtest/libqos/e1000e.h |  5 -
  2 files changed, 10 insertions(+), 15 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v2 6/8] qemu/bswap: Add const_le64()

2023-01-12 Thread Jonathan Cameron via
From: Ira Weiny 

Gcc requires constant versions of cpu_to_le* calls.

Add a 64 bit version.

Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 

---
v2: Update comment (Philippe)

 include/qemu/bswap.h | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/include/qemu/bswap.h b/include/qemu/bswap.h
index 346d05f2aa..c5cb9686c5 100644
--- a/include/qemu/bswap.h
+++ b/include/qemu/bswap.h
@@ -182,11 +182,20 @@ CPU_CONVERT(le, 32, uint32_t)
 CPU_CONVERT(le, 64, uint64_t)
 
 /*
- * Same as cpu_to_le{16,32}, except that gcc will figure the result is
+ * Same as cpu_to_le{16,32,64}, except that gcc will figure the result is
  * a compile-time constant if you pass in a constant.  So this can be
  * used to initialize static variables.
  */
 #if HOST_BIG_ENDIAN
+# define const_le64(_x)  \
+_x) & 0x00ffU) << 56) |  \
+ (((_x) & 0xff00U) << 40) |  \
+ (((_x) & 0x00ffU) << 24) |  \
+ (((_x) & 0xff00U) <<  8) |  \
+ (((_x) & 0x00ffU) >>  8) |  \
+ (((_x) & 0xff00U) >> 24) |  \
+ (((_x) & 0x00ffU) >> 40) |  \
+ (((_x) & 0xff00U) >> 56))
 # define const_le32(_x)  \
 _x) & 0x00ffU) << 24) |  \
  (((_x) & 0xff00U) <<  8) |  \
@@ -196,6 +205,7 @@ CPU_CONVERT(le, 64, uint64_t)
 _x) & 0x00ff) << 8) |\
  (((_x) & 0xff00) >> 8))
 #else
+# define const_le64(_x) (_x)
 # define const_le32(_x) (_x)
 # define const_le16(_x) (_x)
 #endif
-- 
2.37.2




[PATCH 07/31] e1000: Use more constant definitions

2023-01-12 Thread Akihiko Odaki
The definitions for E1000_VFTA_ENTRY_SHIFT, E1000_VFTA_ENTRY_MASK, and
E1000_VFTA_ENTRY_BIT_SHIFT_MASK were copied from:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/intel/e1000/e1000_hw.h?h=v6.0.9#n306

The definitions for E1000_NUM_UNICAST, E1000_MC_TBL_SIZE, and
E1000_VLAN_FILTER_TBL_SIZE were copied from:
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/intel/e1000/e1000_hw.h?h=v6.0.9#n707

Signed-off-by: Akihiko Odaki 
---
 hw/net/e1000.c | 50 +++---
 hw/net/e1000_regs.h|  9 
 hw/net/e1000x_common.c |  5 +++--
 hw/net/e1000x_common.h |  2 +-
 4 files changed, 41 insertions(+), 25 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 7c28200cab..8412a751ae 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -43,8 +43,6 @@
 #include "trace.h"
 #include "qom/object.h"
 
-static const uint8_t bcast[] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};
-
 /* #define E1000_DEBUG */
 
 #ifdef E1000_DEBUG
@@ -67,9 +65,8 @@ static int debugflags = DBGBIT(TXERR) | DBGBIT(GENERAL);
 
 #define IOPORT_SIZE   0x40
 #define PNPMMIO_SIZE  0x2
-#define MIN_BUF_SIZE  60 /* Min. octets in an ethernet frame sans FCS */
 
-#define MAXIMUM_ETHERNET_HDR_LEN (14+4)
+#define MAXIMUM_ETHERNET_HDR_LEN (ETH_HLEN + 4)
 
 /*
  * HW models:
@@ -239,10 +236,16 @@ static const uint16_t phy_reg_init[] = {
 
 [MII_PHYID1] = 0x141,
 /* [MII_PHYID2] configured per DevId, from e1000_reset() */
-[MII_ANAR] = 0xde1,
-[MII_ANLPAR] = 0x1e0,
-[MII_CTRL1000] = 0x0e00,
-[MII_STAT1000] = 0x3c00,
+[MII_ANAR] = MII_ANAR_CSMACD | MII_ANAR_10 |
+ MII_ANAR_10FD | MII_ANAR_TX |
+ MII_ANAR_TXFD | MII_ANAR_PAUSE |
+ MII_ANAR_PAUSE_ASYM,
+[MII_ANLPAR] = MII_ANLPAR_10 | MII_ANLPAR_10FD |
+   MII_ANLPAR_TX | MII_ANLPAR_TXFD,
+[MII_CTRL1000] = MII_CTRL1000_FULL | MII_CTRL1000_PORT |
+ MII_CTRL1000_MASTER,
+[MII_STAT1000] = MII_STAT1000_HALF | MII_STAT1000_FULL |
+ MII_STAT1000_ROK | MII_STAT1000_LOK,
 [M88E1000_PHY_SPEC_CTRL] = 0x360,
 [M88E1000_PHY_SPEC_STATUS] = 0xac00,
 [M88E1000_EXT_PHY_SPEC_CTRL] = 0x0d60,
@@ -548,9 +551,9 @@ putsum(uint8_t *data, uint32_t n, uint32_t sloc, uint32_t 
css, uint32_t cse)
 static inline void
 inc_tx_bcast_or_mcast_count(E1000State *s, const unsigned char *arr)
 {
-if (!memcmp(arr, bcast, sizeof bcast)) {
+if (is_broadcast_ether_addr(arr)) {
 e1000x_inc_reg_if_not_full(s->mac_reg, BPTC);
-} else if (arr[0] & 1) {
+} else if (is_multicast_ether_addr(arr)) {
 e1000x_inc_reg_if_not_full(s->mac_reg, MPTC);
 }
 }
@@ -804,14 +807,16 @@ static int
 receive_filter(E1000State *s, const uint8_t *buf, int size)
 {
 uint32_t rctl = s->mac_reg[RCTL];
-int isbcast = !memcmp(buf, bcast, sizeof bcast), ismcast = (buf[0] & 1);
+int isbcast = is_broadcast_ether_addr(buf);
+int ismcast = is_multicast_ether_addr(buf);
 
 if (e1000x_is_vlan_packet(buf, le16_to_cpu(s->mac_reg[VET])) &&
 e1000x_vlan_rx_filter_enabled(s->mac_reg)) {
-uint16_t vid = lduw_be_p(buf + 14);
-uint32_t vfta = ldl_le_p((uint32_t *)(s->mac_reg + VFTA) +
- ((vid >> 5) & 0x7f));
-if ((vfta & (1 << (vid & 0x1f))) == 0) {
+uint16_t vid = lduw_be_p(&PKT_GET_VLAN_HDR(buf)->h_tci);
+uint32_t vfta =
+ldl_le_p((uint32_t *)(s->mac_reg + VFTA) +
+ ((vid >> E1000_VFTA_ENTRY_SHIFT) & 
E1000_VFTA_ENTRY_MASK));
+if ((vfta & (1 << (vid & E1000_VFTA_ENTRY_BIT_SHIFT_MASK))) == 0) {
 return 0;
 }
 }
@@ -909,7 +914,7 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 uint32_t rdh_start;
 uint16_t vlan_special = 0;
 uint8_t vlan_status = 0;
-uint8_t min_buf[MIN_BUF_SIZE];
+uint8_t min_buf[ETH_ZLEN];
 struct iovec min_iov;
 uint8_t *filter_buf = iov->iov_base;
 size_t size = iov_size(iov, iovcnt);
@@ -1204,8 +1209,8 @@ static const readops macreg_readops[] = {
 [FFLT ... FFLT + 6]   = &mac_readreg,
 [RA ... RA + 31]  = &mac_readreg,
 [WUPM ... WUPM + 31]  = &mac_readreg,
-[MTA ... MTA + 127]   = &mac_readreg,
-[VFTA ... VFTA + 127] = &mac_readreg,
+[MTA ... MTA + E1000_MC_TBL_SIZE - 1]   = &mac_readreg,
+[VFTA ... VFTA + E1000_VLAN_FILTER_TBL_SIZE - 1] = &mac_readreg,
 [FFMT ... FFMT + 254] = &mac_readreg,
 [FFVT ... FFVT + 254] = &mac_readreg,
 [PBM ... PBM + 16383] = &mac_readreg,
@@ -1236,8 +1241,8 @@ static const writeops macreg_writeops[] = {
 [FFLT ... FFLT + 6]   = &set_11bit,
 [RA ... RA + 31]  = &mac_writereg,
 [WUPM ... WUPM + 31]  = &mac_writereg,
-[MTA ... MTA + 127]   = &mac_writereg,
-[VFTA ... VFTA + 127] = &mac_writereg,
+[MTA ... MTA + E1000_

Re: [PATCH 00/31] Introduce igb

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

igb is a family of Intel's gigabit ethernet controllers. This series implements
82576 emulation in particular. You can see the last patch for the documentation.

Note that there is another effort to bring 82576 emulation. This series was
developed independently by Sriram Yagnaraman.
https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg04670.html

Patch 1 - 16 are general improvements for e1000 and e1000e.
Patch 17 - 18 are general improvements for e1000e test code.
Patch 19 - 21 makes necessary modifications to existing files.
Patch 22 starts off implementing igb emulation by copying e1000e code.
Patch 23 renames things so that it won't collide with e1000e.



Patch 24 makes building igb possible.
Patch 25 actually transforms e1000e emulation code into igb emulation.
Patch 26 - 27 makes modifications necessary for tests to existing files.
Patch 28 copies e1000e test code.
Patch 29 transforms e1000e test code into igb test code.
Patch 30 adds ethtool test automation.
Patch 31 adds the documentation.

The main reason why this series is so huge is that the early part of this series
includes general improvements for e1000e. They are placed before copying e1000e
code so we won't need to duplicate those changes for both of e1000e and igb code
later. As their utility do not depend on the igb implementation, they can be
merged earlier if necessary.


You could post patches 1-23 as "e1000x cleanups (preliminary for IGB)"
then post the rest, using the 'Based-on:' tag referring to the first
series cover letter.

I remember looking at the various e1000x cleanups independently, all
packed in the same series makes review context-switching cheaper for
me, so thanks.


Not really related to IGB but since you touched MDIO/MII/PHY files,
it would be great if we unify the MDIO as a qbus and the various PHYs
as qdevs, so boards could use any/multiple PHYs. We had 2 or 3 attempts
to do that in the past but none got merged.



[PATCH v2 4/8] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition

2023-01-12 Thread Jonathan Cameron via
From: Gregory Price 

Remove usage of magic numbers when accessing capacity fields and replace
with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.

Signed-off-by: Gregory Price 
Reviewed-by: Davidlohr Bueso 
Signed-off-by: Jonathan Cameron 

---
v2:
Change to 256 * MiB and include qemu/units.h (Philippe Mathieu-Daudé)

 hw/cxl/cxl-mailbox-utils.c | 15 +--
 1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index bc1bb18844..3f67b665f5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -12,8 +12,11 @@
 #include "hw/pci/pci.h"
 #include "qemu/cutils.h"
 #include "qemu/log.h"
+#include "qemu/units.h"
 #include "qemu/uuid.h"
 
+#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)
+
 /*
  * How to add a new command, example. The command set FOO, with cmd BAR.
  *  1. Add the command set and cmd to the enum.
@@ -138,7 +141,7 @@ static ret_code cmd_firmware_update_get_info(struct cxl_cmd 
*cmd,
 } QEMU_PACKED *fw_info;
 QEMU_BUILD_BUG_ON(sizeof(*fw_info) != 0x50);
 
-if (cxl_dstate->pmem_size < (256 << 20)) {
+if (cxl_dstate->pmem_size < CXL_CAPACITY_MULTIPLIER) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
 
@@ -283,7 +286,7 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
 uint64_t size = cxl_dstate->pmem_size;
 
-if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
 
@@ -293,8 +296,8 @@ static ret_code cmd_identify_memory_device(struct cxl_cmd 
*cmd,
 /* PMEM only */
 snprintf(id->fw_revision, 0x10, "BWFW VERSION %02d", 0);
 
-id->total_capacity = size / (256 << 20);
-id->persistent_capacity = size / (256 << 20);
+id->total_capacity = size / CXL_CAPACITY_MULTIPLIER;
+id->persistent_capacity = size / CXL_CAPACITY_MULTIPLIER;
 id->lsa_size = cvc->get_lsa_size(ct3d);
 
 *len = sizeof(*id);
@@ -314,14 +317,14 @@ static ret_code cmd_ccls_get_partition_info(struct 
cxl_cmd *cmd,
 QEMU_BUILD_BUG_ON(sizeof(*part_info) != 0x20);
 uint64_t size = cxl_dstate->pmem_size;
 
-if (!QEMU_IS_ALIGNED(size, 256 << 20)) {
+if (!QEMU_IS_ALIGNED(size, CXL_CAPACITY_MULTIPLIER)) {
 return CXL_MBOX_INTERNAL_ERROR;
 }
 
 /* PMEM only */
 part_info->active_vmem = 0;
 part_info->next_vmem = 0;
-part_info->active_pmem = size / (256 << 20);
+part_info->active_pmem = size / CXL_CAPACITY_MULTIPLIER;
 part_info->next_pmem = 0;
 
 *len = sizeof(*part_info);
-- 
2.37.2




Re: [PATCH 21/31] e1000: Split header files

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

Some definitions in the header files are invalid for igb so extract
them to new header files to keep igb from referring to them.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
  hw/net/e1000.c |   2 +-
  hw/net/e1000_common.h  | 104 +
  hw/net/e1000_regs.h| 927 +---
  hw/net/e1000e.c|   4 +-
  hw/net/e1000e_core.c   |   2 +-
  hw/net/e1000x_common.c |   2 +-
  hw/net/e1000x_common.h |  74 
  hw/net/e1000x_regs.h   | 940 +
  8 files changed, 1051 insertions(+), 1004 deletions(-)
  create mode 100644 hw/net/e1000_common.h
  create mode 100644 hw/net/e1000x_regs.h




diff --git a/hw/net/e1000_common.h b/hw/net/e1000_common.h
new file mode 100644
index 00..56afad3feb
--- /dev/null
+++ b/hw/net/e1000_common.h
@@ -0,0 +1,104 @@
+/*
+ * QEMU e1000(e) emulation - shared code


s/code/definitions/


+ *
+ * Copyright (c) 2008 Qumranet
+ *
+ * Based on work done by:
+ * Nir Peleg, Tutis Systems Ltd. for Qumranet Inc.
+ * Copyright (c) 2007 Dan Aloni
+ * Copyright (c) 2004 Antony T Curtis
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#ifndef HW_NET_E1000_COMMON_H
+#define HW_NET_E1000_COMMON_H

   ...


+#include "e1000x_common.h"


No need to include this header here. Can we restrict it to the units
requiring access to these declarations? Otherwise:

Reviewed-by: Philippe Mathieu-Daudé 



+#endif






[PATCH v2 1/8] hw/mem/cxl_type3: Improve error handling in realize()

2023-01-12 Thread Jonathan Cameron via
msix_init_exclusive_bar() can fail, so if it does cleanup the address space.

Reviewed-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
 hw/mem/cxl_type3.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
index dae4fd89ca..252822bd82 100644
--- a/hw/mem/cxl_type3.c
+++ b/hw/mem/cxl_type3.c
@@ -401,7 +401,7 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 MemoryRegion *mr = ®s->component_registers;
 uint8_t *pci_conf = pci_dev->config;
 unsigned short msix_num = 1;
-int i;
+int i, rc;
 
 if (!cxl_setup_memory(ct3d, errp)) {
 return;
@@ -438,7 +438,10 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
  &ct3d->cxl_dstate.device_registers);
 
 /* MSI(-X) Initailization */
-msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+rc = msix_init_exclusive_bar(pci_dev, msix_num, 4, NULL);
+if (rc) {
+goto err_address_space_free;
+}
 for (i = 0; i < msix_num; i++) {
 msix_vector_use(pci_dev, i);
 }
@@ -450,6 +453,11 @@ static void ct3_realize(PCIDevice *pci_dev, Error **errp)
 cxl_cstate->cdat.free_cdat_table = ct3_free_cdat_table;
 cxl_cstate->cdat.private = ct3d;
 cxl_doe_cdat_init(cxl_cstate, errp);
+return;
+
+err_address_space_free:
+address_space_destroy(&ct3d->hostmem_as);
+return;
 }
 
 static void ct3_exit(PCIDevice *pci_dev)
-- 
2.37.2




Re: [PATCH 20/31] pcie: Introduce pcie_sriov_num_vfs

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

igb can use this function to change its behavior depending on the
number of virtual functions currently enabled.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
  hw/pci/pcie_sriov.c | 5 +
  include/hw/pci/pcie_sriov.h | 3 +++
  2 files changed, 8 insertions(+)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 16/31] e1000e: Set MII_ANER_NWAY

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

This keeps Windows driver 12.18.9.23 from generating an event with ID
30. The description of the event is as follows:

Intel(R) 82574L Gigabit Network Connection
  PROBLEM: The network adapter is configured for auto-negotiation but
the link partner is not.  This may result in a duplex mismatch.
  ACTION: Configure the link partner for auto-negotiation.


Signed-off-by: Akihiko Odaki 
---
  hw/net/e1000e_core.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)


Reviewed-by: Philippe Mathieu-Daudé 




[PATCH v2 2/8] hw/pci-bridge/cxl_downstream: Fix type naming mismatch

2023-01-12 Thread Jonathan Cameron via
Fix capitalization difference between struct name and typedef.

Tested-by: Philippe Mathieu-Daudé 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 
---
 hw/pci-bridge/cxl_downstream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/pci-bridge/cxl_downstream.c b/hw/pci-bridge/cxl_downstream.c
index 3d4e6b59cd..54f507318f 100644
--- a/hw/pci-bridge/cxl_downstream.c
+++ b/hw/pci-bridge/cxl_downstream.c
@@ -15,7 +15,7 @@
 #include "hw/pci/pcie_port.h"
 #include "qapi/error.h"
 
-typedef struct CXLDownStreamPort {
+typedef struct CXLDownstreamPort {
 /*< private >*/
 PCIESlot parent_obj;
 
-- 
2.37.2




[PATCH v2 8/8] hw/cxl/mailbox: Use new UUID network order define for cel_uuid

2023-01-12 Thread Jonathan Cameron via
From: Ira Weiny 

The cel_uuid was programatically generated previously because there was
no static initializer for network order UUIDs.

Use the new network order initializer for cel_uuid.  Adjust
cxl_initialize_mailbox() because it can't fail now.

Update specification reference.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Ira Weiny 
Signed-off-by: Jonathan Cameron 

---
v2:
Make it const (Philippe)

 hw/cxl/cxl-device-utils.c   |  2 +-
 hw/cxl/cxl-mailbox-utils.c  | 13 ++---
 include/hw/cxl/cxl_device.h |  2 +-
 3 files changed, 8 insertions(+), 9 deletions(-)

diff --git a/hw/cxl/cxl-device-utils.c b/hw/cxl/cxl-device-utils.c
index 83ce7a8270..4c5e88aaf5 100644
--- a/hw/cxl/cxl-device-utils.c
+++ b/hw/cxl/cxl-device-utils.c
@@ -267,5 +267,5 @@ void cxl_device_register_init_common(CXLDeviceState 
*cxl_dstate)
 cxl_device_cap_init(cxl_dstate, MEMORY_DEVICE, 0x4000);
 memdev_reg_init_common(cxl_dstate);
 
-assert(cxl_initialize_mailbox(cxl_dstate) == 0);
+cxl_initialize_mailbox(cxl_dstate);
 }
diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index 3f67b665f5..206e04a4b8 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -193,7 +193,11 @@ static ret_code cmd_timestamp_set(struct cxl_cmd *cmd,
 return CXL_MBOX_SUCCESS;
 }
 
-static QemuUUID cel_uuid;
+/* CXL 3.0 8.2.9.5.2.1 Command Effects Log (CEL) */
+static const QemuUUID cel_uuid = {
+.data = UUID(0x0da9c0b5, 0xbf41, 0x4b78, 0x8f, 0x79,
+ 0x96, 0xb1, 0x62, 0x3b, 0x3f, 0x17)
+};
 
 /* 8.2.9.4.1 */
 static ret_code cmd_logs_get_supported(struct cxl_cmd *cmd,
@@ -458,11 +462,8 @@ void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
  DOORBELL, 0);
 }
 
-int cxl_initialize_mailbox(CXLDeviceState *cxl_dstate)
+void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate)
 {
-/* CXL 2.0: Table 169 Get Supported Logs Log Entry */
-const char *cel_uuidstr = "0da9c0b5-bf41-4b78-8f79-96b1623b3f17";
-
 for (int set = 0; set < 256; set++) {
 for (int cmd = 0; cmd < 256; cmd++) {
 if (cxl_cmd_set[set][cmd].handler) {
@@ -476,6 +477,4 @@ int cxl_initialize_mailbox(CXLDeviceState *cxl_dstate)
 }
 }
 }
-
-return qemu_uuid_parse(cel_uuidstr, &cel_uuid);
 }
diff --git a/include/hw/cxl/cxl_device.h b/include/hw/cxl/cxl_device.h
index 250adf18b2..7e5ad65c1d 100644
--- a/include/hw/cxl/cxl_device.h
+++ b/include/hw/cxl/cxl_device.h
@@ -170,7 +170,7 @@ CXL_DEVICE_CAPABILITY_HEADER_REGISTER(MEMORY_DEVICE,
   CXL_DEVICE_CAP_HDR1_OFFSET +
   CXL_DEVICE_CAP_REG_SIZE * 2)
 
-int cxl_initialize_mailbox(CXLDeviceState *cxl_dstate);
+void cxl_initialize_mailbox(CXLDeviceState *cxl_dstate);
 void cxl_process_mailbox(CXLDeviceState *cxl_dstate);
 
 #define cxl_device_cap_init(dstate, reg, cap_id)   \
-- 
2.37.2




Re: [PATCH v2 4/8] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 11:26, Jonathan Cameron wrote:

From: Gregory Price 

Remove usage of magic numbers when accessing capacity fields and replace
with CXL_CAPACITY_MULTIPLIER, matching the kernel definition.

Signed-off-by: Gregory Price 
Reviewed-by: Davidlohr Bueso 
Signed-off-by: Jonathan Cameron 

---
v2:
Change to 256 * MiB and include qemu/units.h (Philippe Mathieu-Daudé)

  hw/cxl/cxl-mailbox-utils.c | 15 +--
  1 file changed, 9 insertions(+), 6 deletions(-)

diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
index bc1bb18844..3f67b665f5 100644
--- a/hw/cxl/cxl-mailbox-utils.c
+++ b/hw/cxl/cxl-mailbox-utils.c
@@ -12,8 +12,11 @@
  #include "hw/pci/pci.h"
  #include "qemu/cutils.h"
  #include "qemu/log.h"
+#include "qemu/units.h"
  #include "qemu/uuid.h"
  
+#define CXL_CAPACITY_MULTIPLIER   (256 * MiB)


Thanks, appreciated.

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 14/31] e1000e: Configure ResettableClass

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

This is part of recent efforts of refactoring e1000 and e1000e.

DeviceClass's reset member is deprecated so migrate to ResettableClass.
Thre is no behavioral difference.


Typo 'There'.



Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
---
  hw/net/e1000e.c | 8 +---
  1 file changed, 5 insertions(+), 3 deletions(-)





Re: [PATCH 17/31] tests/qtest/e1000e-test: Fix the code style

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

igb implementation first starts off by copying e1000e code. Correct the
code style before that.

Signed-off-by: Akihiko Odaki 
---
  tests/qtest/e1000e-test.c   | 2 +-
  tests/qtest/libqos/e1000e.c | 6 --
  2 files changed, 5 insertions(+), 3 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 01/31] e1000e: Fix the code style

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

igb implementation first starts off by copying e1000e code. Correct the
code style before that.

Signed-off-by: Akihiko Odaki 
---
  hw/net/e1000.c |  41 
  hw/net/e1000e.c|  72 ++--
  hw/net/e1000e_core.c   | 103 ++---
  hw/net/e1000e_core.h   |  66 +-
  hw/net/e1000x_common.h |  44 +-
  5 files changed, 168 insertions(+), 158 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH 13/31] e1000: Configure ResettableClass

2023-01-12 Thread Philippe Mathieu-Daudé

On 12/1/23 10:57, Akihiko Odaki wrote:

This is part of recent efforts of refactoring e1000 and e1000e.

DeviceClass's reset member is deprecated so migrate to ResettableClass.
Thre is no behavioral difference.


Typo 'There'.



Signed-off-by: Akihiko Odaki 
Reviewed-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
---
  hw/net/e1000.c | 7 ---
  1 file changed, 4 insertions(+), 3 deletions(-)





[PATCH v2 6/7] target/arm/sme: Rebuild hflags in aarch64_set_svcr()

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 linux-user/aarch64/cpu_loop.c | 8 +---
 linux-user/aarch64/signal.c   | 3 ---
 target/arm/helper.c   | 6 +-
 target/arm/sme_helper.c   | 8 
 4 files changed, 6 insertions(+), 19 deletions(-)

diff --git a/linux-user/aarch64/cpu_loop.c b/linux-user/aarch64/cpu_loop.c
index 5e93d27d8f..2e2f7cf218 100644
--- a/linux-user/aarch64/cpu_loop.c
+++ b/linux-user/aarch64/cpu_loop.c
@@ -89,14 +89,8 @@ void cpu_loop(CPUARMState *env)
 
 switch (trapnr) {
 case EXCP_SWI:
-/*
- * On syscall, PSTATE.ZA is preserved, along with the ZA matrix.
- * PSTATE.SM is cleared, per SMSTOP, which does ResetSVEState.
- */
+/* On syscall, PSTATE.ZA is preserved, PSTATE.SM is cleared. */
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
-if (FIELD_EX64(env->svcr, SVCR, SM)) {
-arm_rebuild_hflags(env);
-}
 ret = do_syscall(env,
  env->xregs[8],
  env->xregs[0],
diff --git a/linux-user/aarch64/signal.c b/linux-user/aarch64/signal.c
index a326a6def5..b265cfd470 100644
--- a/linux-user/aarch64/signal.c
+++ b/linux-user/aarch64/signal.c
@@ -667,9 +667,6 @@ static void target_setup_frame(int usig, struct 
target_sigaction *ka,
 
 /* Invoke the signal handler with both SM and ZA disabled. */
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK | R_SVCR_ZA_MASK);
-if (env->svcr) {
-arm_rebuild_hflags(env);
-}
 
 if (info) {
 tswap_siginfo(&frame->info, info);
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 26c3bb4cdf..cf77bdd378 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6735,6 +6735,9 @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, 
uint64_t mask)
 {
 uint64_t change = (env->svcr ^ new) & mask;
 
+if (change == 0) {
+return;
+}
 env->svcr ^= change;
 
 if (change & R_SVCR_SM_MASK) {
@@ -6752,6 +6755,8 @@ void aarch64_set_svcr(CPUARMState *env, uint64_t new, 
uint64_t mask)
 if (change & new & R_SVCR_ZA_MASK) {
 memset(env->zarray, 0, sizeof(env->zarray));
 }
+
+arm_rebuild_hflags(env);
 }
 
 static void svcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
@@ -6760,7 +6765,6 @@ static void svcr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 helper_set_pstate_sm(env, FIELD_EX64(value, SVCR, SM));
 helper_set_pstate_za(env, FIELD_EX64(value, SVCR, ZA));
 aarch64_set_svcr(env, value, -1);
-arm_rebuild_hflags(env);
 }
 
 static void smcr_write(CPUARMState *env, const ARMCPRegInfo *ri,
diff --git a/target/arm/sme_helper.c b/target/arm/sme_helper.c
index e146c17ba1..3abe03e4cb 100644
--- a/target/arm/sme_helper.c
+++ b/target/arm/sme_helper.c
@@ -31,20 +31,12 @@
 
 void helper_set_pstate_sm(CPUARMState *env, uint32_t i)
 {
-if (i == FIELD_EX64(env->svcr, SVCR, SM)) {
-return;
-}
 aarch64_set_svcr(env, 0, R_SVCR_SM_MASK);
-arm_rebuild_hflags(env);
 }
 
 void helper_set_pstate_za(CPUARMState *env, uint32_t i)
 {
-if (i == FIELD_EX64(env->svcr, SVCR, ZA)) {
-return;
-}
 aarch64_set_svcr(env, 0, R_SVCR_ZA_MASK);
-arm_rebuild_hflags(env);
 }
 
 void helper_sme_zero(CPUARMState *env, uint32_t imm, uint32_t svl)
-- 
2.38.1




Re: [PATCH] bulk: Rename TARGET_FMT_plx -> HWADDR_FMT_plx

2023-01-12 Thread Peter Maydell
On Tue, 10 Jan 2023 at 22:04, BALATON Zoltan  wrote:
>
> On Tue, 10 Jan 2023, Philippe Mathieu-Daudé wrote:
> > The 'hwaddr' type is defined in "exec/hwaddr.h" as:
> >
> >hwaddr is the type of a physical address
> >   (its size can be different from 'target_ulong').
> >
> > All definitions use the 'HWADDR_' prefix, except TARGET_FMT_plx:
> >
> > $ fgrep define include/exec/hwaddr.h
> > #define HWADDR_H
> > #define HWADDR_BITS 64
> > #define HWADDR_MAX UINT64_MAX
> > #define TARGET_FMT_plx "%016" PRIx64
> > ^^
> > #define HWADDR_PRId PRId64
> > #define HWADDR_PRIi PRIi64
> > #define HWADDR_PRIo PRIo64
> > #define HWADDR_PRIu PRIu64
> > #define HWADDR_PRIx PRIx64
>
> Why are there both TARGET_FMT_plx and HWADDR_PRIx? Why not just use
> HWADDR_PRIx instead?

TARGET_FMT_plx is part of a family of defines for printing
target_foo types; the rest are in cpu-defs.h. These all include the
'%' character. This is more convenient to use, but it's also
out-of-line with the C standard format macros like PRIx64.
The HWADDR_* macros take the approach of aligning with how you
use the C standard format macros.

As usual in QEMU, where there are two different ways of doing
things, it's probably because one of them is a lot older than
the other and written by a different person. In theory it would
be nice to apply some consistency here but it rarely seems
worth the effort of the bulk code edit.

thanks
-- PMM



[PATCH 22/31] igb: Copy e1000e code

2023-01-12 Thread Akihiko Odaki
Start off igb implementation by copying e1000e code first as igb
resembles e1000e.

Signed-off-by: Gal Hammer 
Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Akihiko Odaki 
---
 MAINTAINERS |5 +
 hw/net/igb.c|  726 +
 hw/net/igb_common.h |  104 ++
 hw/net/igb_core.c   | 3567 +++
 hw/net/igb_core.h   |  156 ++
 5 files changed, 4558 insertions(+)
 create mode 100644 hw/net/igb.c
 create mode 100644 hw/net/igb_common.h
 create mode 100644 hw/net/igb_core.c
 create mode 100644 hw/net/igb_core.h

diff --git a/MAINTAINERS b/MAINTAINERS
index b270eb8e5b..a3b2f67f66 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2208,6 +2208,11 @@ S: Maintained
 F: hw/net/e1000e*
 F: tests/qtest/fuzz-e1000e-test.c
 
+igb
+M: Akihiko Odaki 
+S: Maintained
+F: hw/net/igb*
+
 eepro100
 M: Stefan Weil 
 S: Maintained
diff --git a/hw/net/igb.c b/hw/net/igb.c
new file mode 100644
index 00..d61efb781e
--- /dev/null
+++ b/hw/net/igb.c
@@ -0,0 +1,726 @@
+/*
+ * QEMU INTEL 82574 GbE NIC emulation
+ *
+ * Software developer's manuals:
+ * 
http://www.intel.com/content/dam/doc/datasheet/82574l-gbe-controller-datasheet.pdf
+ *
+ * Copyright (c) 2015 Ravello Systems LTD (http://ravellosystems.com)
+ * Developed by Daynix Computing LTD (http://www.daynix.com)
+ *
+ * Authors:
+ * Dmitry Fleytman 
+ * Leonid Bloch 
+ * Yan Vugenfirer 
+ *
+ * Based on work done by:
+ * Nir Peleg, Tutis Systems Ltd. for Qumranet Inc.
+ * Copyright (c) 2008 Qumranet
+ * Based on work done by:
+ * Copyright (c) 2007 Dan Aloni
+ * Copyright (c) 2004 Antony T Curtis
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "net/eth.h"
+#include "net/net.h"
+#include "net/tap.h"
+#include "qemu/module.h"
+#include "qemu/range.h"
+#include "sysemu/sysemu.h"
+#include "hw/hw.h"
+#include "hw/net/mii.h"
+#include "hw/pci/msi.h"
+#include "hw/pci/msix.h"
+#include "hw/qdev-properties.h"
+#include "migration/vmstate.h"
+
+#include "e1000_common.h"
+#include "e1000e_core.h"
+
+#include "trace.h"
+#include "qapi/error.h"
+#include "qom/object.h"
+
+#define TYPE_E1000E "e1000e"
+OBJECT_DECLARE_SIMPLE_TYPE(E1000EState, E1000E)
+
+struct E1000EState {
+PCIDevice parent_obj;
+NICState *nic;
+NICConf conf;
+
+MemoryRegion mmio;
+MemoryRegion flash;
+MemoryRegion io;
+MemoryRegion msix;
+
+uint32_t ioaddr;
+
+uint16_t subsys_ven;
+uint16_t subsys;
+
+uint16_t subsys_ven_used;
+uint16_t subsys_used;
+
+bool disable_vnet;
+
+E1000ECore core;
+bool init_vet;
+};
+
+#define E1000E_MMIO_IDX 0
+#define E1000E_FLASH_IDX1
+#define E1000E_IO_IDX   2
+#define E1000E_MSIX_IDX 3
+
+#define E1000E_MMIO_SIZE(128 * KiB)
+#define E1000E_FLASH_SIZE   (128 * KiB)
+#define E1000E_IO_SIZE  (32)
+#define E1000E_MSIX_SIZE(16 * KiB)
+
+#define E1000E_MSIX_TABLE   (0x)
+#define E1000E_MSIX_PBA (0x2000)
+
+static uint64_t
+e1000e_mmio_read(void *opaque, hwaddr addr, unsigned size)
+{
+E1000EState *s = opaque;
+return e1000e_core_read(&s->core, addr, size);
+}
+
+static void
+e1000e_mmio_write(void *opaque, hwaddr addr,
+   uint64_t val, unsigned size)
+{
+E1000EState *s = opaque;
+e1000e_core_write(&s->core, addr, val, size);
+}
+
+static bool
+e1000e_io_get_reg_index(E1000EState *s, uint32_t *idx)
+{
+if (s->ioaddr < 0x1) {
+*idx = s->ioaddr;
+return true;
+}
+
+if (s->ioaddr < 0x7) {
+trace_e1000e_wrn_io_addr_undefined(s->ioaddr);
+return false;
+}
+
+if (s->ioaddr < 0xF) {
+trace_e1000e_wrn_io_addr_flash(s->ioaddr);
+return false;
+}
+
+trace_e1000e_wrn_io_addr_unknown(s->ioaddr);
+return false;
+}
+
+static uint64_t
+e1000e_io_read(void *opaque, hwaddr addr, unsigned size)
+{
+E1000EState *s = opaque;
+uint32_t idx = 0;
+uint64_t val;
+
+switch (addr) {
+case E1000_IOADDR:
+trace_e1000e_io_read_addr(s->ioaddr);
+return s->ioaddr;
+case E1000_IODATA:
+if (e1000e_io_get_reg_index(s, &idx)) {
+val = e1000e_core_read(&s->core, idx, sizeof(val));
+trace_e1000e_io_read_data(idx, val);
+return val;
+}
+return 0;
+default:
+trac

[PATCH v2 1/7] target/arm/sme: Reorg SME access handling in handle_msr_i()

2023-01-12 Thread Philippe Mathieu-Daudé
From: Richard Henderson 

Signed-off-by: Richard Henderson 
Message-Id: <20230112004322.161330-1-richard.hender...@linaro.org>
[PMD: Split patch in multiple tiny steps]
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/arm/translate-a64.c | 24 +---
 1 file changed, 13 insertions(+), 11 deletions(-)

diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 2ee171f249..35cc851246 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -1841,18 +1841,20 @@ static void handle_msr_i(DisasContext *s, uint32_t insn,
 goto do_unallocated;
 }
 if (sme_access_check(s)) {
-bool i = crm & 1;
-bool changed = false;
+int old = s->pstate_sm | (s->pstate_za << 1);
+int new = (crm & 1) * 3;
+int msk = (crm >> 1) & 3;
 
-if ((crm & 2) && i != s->pstate_sm) {
-gen_helper_set_pstate_sm(cpu_env, tcg_constant_i32(i));
-changed = true;
-}
-if ((crm & 4) && i != s->pstate_za) {
-gen_helper_set_pstate_za(cpu_env, tcg_constant_i32(i));
-changed = true;
-}
-if (changed) {
+if ((old ^ new) & msk) {
+/* At least one bit changes. */
+bool i = crm & 1;
+
+if ((crm & 2) && i != s->pstate_sm) {
+gen_helper_set_pstate_sm(cpu_env, tcg_constant_i32(i));
+}
+if ((crm & 4) && i != s->pstate_za) {
+gen_helper_set_pstate_za(cpu_env, tcg_constant_i32(i));
+}
 gen_rebuild_hflags(s);
 } else {
 s->base.is_jmp = DISAS_NEXT;
-- 
2.38.1




Re: [PATCH] remove unnecessary extern "C" blocks

2023-01-12 Thread Peter Maydell
On Wed, 11 Jan 2023 at 09:14, Paolo Bonzini  wrote:
>
> On 1/10/23 11:53, Peter Maydell wrote:
> > On Tue, 10 Jan 2023 at 09:33, Paolo Bonzini  wrote:
> >>
> >> A handful of header files in QEMU are wrapped with extern "C" blocks.
> >> These are not necessary: there are C++ source files anymore in QEMU,
> >> and even where there were some, they did not include most of these
> >> files anyway.
> >
> > Any reason not to also take out the extern "C" block in osdep.h
> > and the uses of QEMU_EXTERN_C ?
>
> qemu/osdep.h is still included by the C++ sources in qga/vss-win32.

If anything C++ still includes osdep.h then you can't remove
the handling of this from os-win32.h and os-posix.h, because
those files are included from osdep.h.

thanks
-- PMM



Re: Questions about how block devices use snapshots

2023-01-12 Thread Kevin Wolf
Am 11.01.2023 um 17:21 hat Zhiyong Ye geschrieben:
> Hi Kevin,
> 
> Can I ask again how base.img + diff.qcow2 can be re-merged into one image
> via qemu-img or hmp command when modified.img is discarded?

You can either use 'qemu-img commit' to copy all of the data from
diff.qcow2 back into base.img (this is probably what you want), or
'qemu-img rebase' to copy all of the data from base.img into diff.qcow2.

Kevin




Re: [PATCH v14 11/11] docs/s390x/cpu topology: document s390x cpu topology

2023-01-12 Thread Thomas Huth

On 05/01/2023 15.53, Pierre Morel wrote:

Add some basic examples for the definition of cpu topology
in s390x.

Signed-off-by: Pierre Morel 
---
  docs/system/s390x/cpu-topology.rst | 292 +
  docs/system/target-s390x.rst   |   1 +
  2 files changed, 293 insertions(+)
  create mode 100644 docs/system/s390x/cpu-topology.rst

diff --git a/docs/system/s390x/cpu-topology.rst 
b/docs/system/s390x/cpu-topology.rst
new file mode 100644
index 00..0020b70b50
--- /dev/null
+++ b/docs/system/s390x/cpu-topology.rst
@@ -0,0 +1,292 @@
+CPU Topology on s390x
+=
+
+CPU Topology on S390x provides up to 5 levels of topology containers:


You sometimes write "Topology" with a capital T, sometimes lower case ... 
I'd suggest to write it lower case consistently everywhere.



+nodes, drawers, books, sockets and CPUs.


Hmm, so here you mention that "nodes" are usable on s390x, too? ... in 
another spot below, you don't mention these anymore...



+While the higher level containers, Containers Topology List Entries,
+(Containers TLE) define a tree hierarchy, the lowest level of topology
+definition, the CPU Topology List Entry (CPU TLE), provides the placement
+of the CPUs inside the parent container.
+
+Currently QEMU CPU topology uses a single level of container: the sockets.
+
+For backward compatibility, threads can be declared on the ``-smp`` command
+line. They will be seen as CPUs by the guest as long as multithreading
+is not really supported by QEMU for S390.


Maybe mention that threads are not allowed with machine types >= 7.2 anymore?


+Beside the topological tree, S390x provides 3 CPU attributes:
+- CPU type
+- polarity entitlement
+- dedication
+
+Prerequisites
+-
+
+To use CPU Topology a Linux QEMU/KVM machine providing the CPU Topology 
facility
+(STFLE bit 11) is required.
+
+However, since this facility has been enabled by default in an early version
+of QEMU, we use a capability, ``KVM_CAP_S390_CPU_TOPOLOGY``, to notify KVM
+QEMU use of the CPU Topology.


Has it? I thought bit 11 was not enabled by default in the past?


+Enabling CPU topology
+-
+
+Currently, CPU topology is only enabled in the host model.


add a "by default if support is available in the host kernel" at the end of 
the sentence?



+Enabling CPU topology in a CPU model is done by setting the CPU flag
+``ctop`` to ``on`` like in:
+
+.. code-block:: bash
+
+   -cpu gen16b,ctop=on
+
+Having the topology disabled by default allows migration between
+old and new QEMU without adding new flags.
+
+Default topology usage
+--
+
+The CPU Topology, can be specified on the QEMU command line
+with the ``-smp`` or the ``-device`` QEMU command arguments
+without using any new attributes.
+In this case, the topology will be calculated by simply adding
+to the topology the cores based on the core-id starting with
+core-0 at position 0 of socket-0, book-0, drawer-0 with default


... here you don't mention "nodes" anymore (which you still mentioned at the 
beginning of the doc).



+modifier attributes: horizontal polarity and no dedication.
+
+In the following machine we define 8 sockets with 4 cores each.
+Note that S390 QEMU machines do not implement multithreading.


I'd use s390x instead of S390 to avoid confusion with 31-bit machines.


+.. code-block:: bash
+
+  $ qemu-system-s390x -m 2G \
+-cpu gen16b,ctop=on \
+-smp cpus=5,sockets=8,cores=4,maxcpus=32 \
+-device host-s390x-cpu,core-id=14 \
+
+New CPUs can be plugged using the device_add hmp command like in:
+
+.. code-block:: bash
+
+  (qemu) device_add gen16b-s390x-cpu,core-id=9
+
+The core-id defines the placement of the core in the topology by
+starting with core 0 in socket 0 up to maxcpus.
+
+In the example above:
+
+* There are 5 CPUs provided to the guest with the ``-smp`` command line
+  They will take the core-ids 0,1,2,3,4
+  As we have 4 cores in a socket, we have 4 CPUs provided
+  to the guest in socket 0, with core-ids 0,1,2,3.
+  The last cpu, with core-id 4, will be on socket 1.
+
+* the core with ID 14 provided by the ``-device`` command line will
+  be placed in socket 3, with core-id 14
+
+* the core with ID 9 provided by the ``device_add`` qmp command will
+  be placed in socket 2, with core-id 9
+
+Note that the core ID is machine wide and the CPU TLE masks provided
+by the STSI instruction will be written in a big endian mask:
+
+* in socket 0: 0xf000 (core id 0,1,2,3)
+* in socket 1: 0x0800 (core id 4)
+* in socket 2: 0x0040 (core id 9)
+* in socket 3: 0x0002 (core id 14)


Hmm, who's supposed to be the audience of this documentation? Users? 
Developers? For a doc in docs/system/ I'd expect this to be a documentation 
for users, so this seems to be way too much of implementation detail here 
already. If this is supposed to be a doc for developers instead, the file 
should likely rather go into doc/devel/ 

Re: [PATCH v14 09/11] qapi/s390/cpu topology: monitor query topology information

2023-01-12 Thread Thomas Huth

On 05/01/2023 15.53, Pierre Morel wrote:

Reporting the current topology informations to the admin through
the QEMU monitor.

Signed-off-by: Pierre Morel 
---

...

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 754b1e8408..5730a47f71 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -993,3 +993,19 @@ SRST
``info virtio-queue-element`` *path* *queue* [*index*]
  Display element of a given virtio queue
  ERST
+
+#if defined(TARGET_S390X) && defined(CONFIG_KVM)
+{
+.name   = "query-topology",
+.args_type  = "",
+.params = "",
+.help   = "Show information about CPU topology",
+.cmd= hmp_query_topology,
+.flags  = "p",
+},
+
+SRST
+  ``info query-topology``


"info query-topology" sounds weird ... I'd maybe rather call it only "info 
topology" or "info cpu-topology" here.


 Thomas



+Show information about CPU topology
+ERST
+#endif





  1   2   3   4   >