Except for RDMA migration, other parts of the RDMA subsystem have been
removed since 9.1.
Due to the lack of unit tests and CI tests for RDMA migration, int the
past developing cycles, a few fatal errors were introduced and broke the
RDMA migration, and these issues[1][2] were not fixed until some
It helps to figure out where the first dvsec register is located. In
addition, replace offset and size hardcore with existing macros.
Signed-off-by: Li Zhijian
---
hw/mem/cxl_type3.c | 19 +--
1 file changed, 13 insertions(+), 6 deletions(-)
diff --git a/hw/mem/cxl_type3.c b/hw/
After the kernel commit
0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not match
a CFMWS window")
CXL type3 devices cannot be enabled again after the reboot because this
flag was not reset.
This flag could be changed by the firmware or OS, let it have a
reset(default) value
After the kernel commit
0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not match
a CFMWS window")
CXL type3 devices cannot be enabled again after the reboot because the
control register(see 8.1.3.2 in CXL specifiction 2.0 for more details) was
not reset.
These registers coul
bdrv_activate_all() should not be called from the coroutine context, move
it to the QEMU thread colo_process_incoming_thread() with the bql_lock
protected.
The backtrace is as follows:
#4 0x561af7948362 in bdrv_graph_rdlock_main_loop () at
../block/graph-lock.c:260
#5 0x561af7907a68 i
bdrv_activate_all() should not be called from the coroutine context, move
it to the QEMU thread colo_process_incoming_thread() with the bql_lock
protected.
The backtrace is as follows:
#4 0x561af7948362 in bdrv_graph_rdlock_main_loop () at
../block/graph-lock.c:260
#5 0x561af7907a68 i
It seems that this error does not need to be propagated to the upper,
directly output the error to avoid the leaks
Closes: https://gitlab.com/qemu-project/qemu/-/issues/2283
Signed-off-by: Li Zhijian
---
backends/cryptodev-builtin.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
- Explicitly show the missing module name: replication
- Fix capability name to x-colo
Signed-off-by: Li Zhijian
---
migration/migration.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/migration/migration.c b/migration/migration.c
index 6502e169a3..b4a09c561c 100644
-
Make the code more tight.
Cc: Michael Tokarev
Signed-off-by: Li Zhijian
---
This change/comment suggested by "Michael Tokarev " came
a bit late at that time, let's update it together in these minor set
this time.
---
migration/colo.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff
Currently, it always returns 0, no need to check the return value at all.
In addition, enter colo coroutine only if migration_incoming_colo_enabled()
is true.
Once the destination side enters the COLO* state, the COLO process will
take over the remaining processes until COLO exits.
Signed-off-by:
There is no user for this member. All '-M cxl-fmw.N' options have
been parsed and saved to CXLState.fixed_windows.
Signed-off-by: Li Zhijian
---
hw/cxl/cxl-host.c| 1 -
include/hw/cxl/cxl.h | 1 -
2 files changed, 2 deletions(-)
diff --git a/hw/cxl/cxl-host.c b/hw/cxl/cxl-host.c
index c5f5f
Currently, it always returns 0, no need to check the return value at all.
In addition, enter colo coroutine only if migration_incoming_colo_enabled()
is true.
Once the destination side enters the COLO* state, the COLO process will
take over the remaining processes until COLO exits.
Cc: Fabiano Ros
- Explicitly show the missing module name: replication
- Fix capability name to x-colo
Reviewed-by: Peter Xu
Reviewed-by: Zhang Chen
Signed-off-by: Li Zhijian
---
V2: Collected reviewed-by tags
---
migration/migration.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/
Make the code more tight.
Suggested-by: Michael Tokarev
Reviewed-by: Peter Xu
Reviewed-by: Zhang Chen
Signed-off-by: Li Zhijian
---
V2: Collected reviewed-by tags
This change/comment suggested by "Michael Tokarev " came
a bit late at that time, let's update it together in these minor set
this
This assertion always happens when we sanitize the CXL memory device.
$ echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
It is incorrect to register an MSIX number beyond the device's capability.
Expand the device's MSIX to 10 and introduce the `request_msix_number()`
helper function to dynam
This assertion always happens when we sanitize the CXL memory device.
$ echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
It is incorrect to register an MSIX number beyond the device's capability.
Expand the device's MSIX number and use the enum to maintain the *USED*
and MAX MSIX number
Fixe
msix_uninit_exclusive_bar() should be paired with msix_init_exclusive_bar()
Ensure proper resource cleanup by adding the missing
`msix_uninit_exclusive_bar()` call for the Type3 CXL device.
Signed-off-by: Li Zhijian
---
hw/mem/cxl_type3.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/hw
Address a memory leak issue by ensuring `regs->special_ops` is freed when
`msix_init_exclusive_bar()` encounters an error during CXL Type3 device
initialization.
Additionally, this patch renames err_address_space_free to err_msix_uninit
for better clarity and logical flow
Signed-off-by: Li Zhijia
Simply pass the errp to its callee which will set errp if needed, to
enhance error reporting for CXL Type 3 device initialization by setting
the errp when realization functions fail.
Previously, failing to set `errp` could result in errors being overlooked,
causing the system to mistakenly treat f
This assertion always happens when we sanitize the CXL memory device.
$ echo 1 > /sys/bus/cxl/devices/mem0/security/sanitize
It is incorrect to register an MSIX number beyond the device's capability.
Increase the device's MSIX number to cover the mailbox msix number(9).
Fixes: 43efb0bfad2b ("hw/
Introduce the `CXL_T3_MSIX_VECTOR` enumeration to specify MSIX vector
assignments specific to the Type 3 (T3) CXL device.
The primary goal of this change is to encapsulate the MSIX vector uses
that are unique to the T3 device within an enumeration, improving code
readability and maintenance by avo
On 19/02/2025 22:11, Peter Xu wrote:
> then
> in the test it tries to detect rdma link and fetch the ip only
It should work without root permission if we just*detect* and*fetch ip*.
Do you also mean we can split new-rdma-link.sh to 2 separate scripts
- add-rdma-link.sh
Refactor the page saving logic by integrating the control_save_page()
function directly into ram_save_target_page(). This change consolidates the
RDMA migration decision-making process into a single function, enhancing
clarity and maintainability.
Signed-off-by: Li Zhijian
---
migration/ram.c |
This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link
Test will be skipped if there is no available RoCE link.
# Start of rdma t
qemu_rdma_save_page() no longer returns RAM_SAVE_CONTROL_NOT_SUPP
since commit a4832d299dd ("migration/rdma: Check sooner if we are in postcopy
for save_page()")
Signed-off-by: Li Zhijian
---
migration/rdma.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/migration/rdma.c
Similar to migration_channels_and_transport_compatible(), introduce a
new helper migration_capabilities_and_transport_compatible() to check if
the capabilites is compatible with the transport.
Currently, only move the capabilities vs RDMA transport to this
function.
Signed-off-by: Li Zhijian
---
Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.
Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unkn
It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.
Signed-off-by: Li Zhijian
---
migration/migration.c | 4
1 file changed, 4 insertions(+)
diff --git a/migration/migration.c b/migration/migr
Refactor the migration control logic by eliminating the
`RAM_SAVE_CONTROL_NOT_SUPP` return value within the migration codebase.
This involves moving the checks for RDMA migration status and postcopy
state from rdma_control_save_page() to control_save_page()
With this change, control_save_page() n
Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migration_rdma().
Signed-off-by: Li Zhijian
---
migration/ram.c | 2 +-
migration/rdma.c | 5 +++--
2 files changed, 4 insertions(+), 3 deletions(-)
diff --git a/migration/ram.c b/migratio
- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last
Changs since V1[0]:
Add some saparate patches to refactor and cleanup based on V1
[0]
https://lore.kernel.org/qemu-devel/20250218074345.638203-1-lizhij...@fujitsu.com/
Li Zhijian (8
This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link
Test will be skipped if there is no available RoCE link.
# Start of rdma t
Depending on the order of starting RDMA and setting capability,
the following scenarios can be categorized into the following scenarios:
Source:
S1: [set capabilities] -> [Start RDMA outgoing]
Destination:
D1: [set capabilities] -> [Start RDMA incoming]
D2: [Start RDMA incoming] -> [set capabili
- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last
Changes since V3:
- check RDMA and capabilities are compatible on both sides # renamed from
previous V3's "migration: Add
migration_capabilities_and_transport_compatible()"
Changes
Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().
Signed-off-by: Li Zhijian
---
V3:
reorder: 7th->4th
---
migration/rdma.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/migration/rdma.c b/migrati
control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
- Similar to other branches style in ram_save_target_page(), involve RDMA
only if the condition 'migrate_rdma()' is true.
- Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
Signed-off
Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.
Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unkn
It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.
Reviewed-by: Peter Xu
Signed-off-by: Li Zhijian
---
V3:
- collect Reviewed tag
- reoder: 6th -> 3th
---
migration/options.c | 4
1 file
Similar to migration_channels_and_transport_compatible(), introduce a
new helper migration_capabilities_and_transport_compatible() to check if
the capabilites is compatible with the transport.
Currently, only move the capabilities vs RDMA transport to this
function.
Reviewed-by: Peter Xu
Signed-
This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link
Test will be skipped if there is no available RoCE link.
# Start of rdma t
control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
- Similar to other branches style in ram_save_target_page(), involve RDMA
only if the condition 'migrate_rdma()' is true.
- Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
Signed-off
Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.
Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unkn
It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.
Reviewed-by: Peter Xu
Signed-off-by: Li Zhijian
---
V3:
- collect Reviewed tag
- reoder: 6th -> 3th
---
migration/migration.c | 4
1 fil
- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last
Chnages since V2:
- squash previous 2/3/4 to '[PATCH v3 5/6] migration: Unfold
control_save_page()'
- reorder the patch layout to prevent recently added code from being deleted
agai
Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().
Signed-off-by: Li Zhijian
---
V3:
reorder: 7th->4th
---
migration/rdma.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/migration/rdma.c b/migrati
This qtest requirs there is RXE link in the host.
Here is an example to show how to add this RXE link:
$ ./new-rdma-link.sh
192.168.22.93
Signed-off-by: Li Zhijian
---
The RDMA migration was broken again...due to lack of sufficient test/qtest.
It's urgly to add and execute a script to establish
Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.
Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unkn
Depending on the order of starting RDMA and setting capability,
they can be categorized into the following scenarios:
Source:
S1: [set capabilities] -> [Start RDMA outgoing]
Destination:
D1: [set capabilities] -> [Start RDMA incoming]
D2: [Start RDMA incoming] -> [set capabilities]
Previously,
It's believed that RDMA + postcopy-ram has been broken for a while.
Rather than spending time re-enabling it, let's simply disable it as a
trade-off.
Reviewed-by: Peter Xu
Signed-off-by: Li Zhijian
---
V3:
- collect Reviewed tag
- reoder: 6th -> 3th
---
migration/options.c | 4
1 file
This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to
- setup a new Soft-RoCE(aka RXE) if it's root
- detect existing RoCE link
Test will be skipped if there is no available RoCE link.
# Start of rdma t
- It fix the RDMA migration broken issue
- disable RDMA + postcopy
- some cleanups
- Add a qtest for RDMA at last
Changes since V4:
- collect Reviewed-tags
- Address comments in patch "migration: Add qtest for migration over RDMA"
from Fabiano Rosas
Changes since V3:
- check RDMA and capabili
Address an error in RDMA-based migration by ensuring RDMA is prioritized
when saving pages in `ram_save_target_page()`.
Previously, the RDMA protocol's page-saving step was placed after other
protocols due to a refactoring in commit bc38dc2f5f3. This led to migration
failures characterized by unkn
Since we have disabled RDMA + postcopy, it's safe to remove
the migration_in_postcopy() that follows the migrate_rdma().
Reviewed-by: Peter Xu
Signed-off-by: Li Zhijian
---
V3:
reorder: 7th->4th
---
migration/rdma.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/mig
control_save_page() is for RDMA only, unfold it to make the code more
clear.
In addition:
- Similar to other branches style in ram_save_target_page(), involve RDMA
only if the condition 'migrate_rdma()' is true.
- Further simplify the code by removing the RAM_SAVE_CONTROL_NOT_SUPP.
Reviewed-b
This qtest requires there is a RDMA(RoCE) link in the host.
In order to make the test work smoothly, introduce a
scripts/rdma-migration-helper.sh to detect existing RoCE link before
running the test.
Test will be skipped if there is no available RoCE link.
# Start of rdma tests
# Running /x86_64
Reverse the logical condition for HDM passthrough support in
pci_expander_bridge. This patch ensures the HDM passthrough condition
is evaluated only when hdm_for_passthrough is set to true, aligning
behavior with intended semantics and comments.
Signed-off-by: Li Zhijian
---
This change corrects
Recently, we removed ipv6 restriction[0] from RDMA migration, add a
test for it.
[0]
https://lore.kernel.org/qemu-devel/20250326095224.9918-1-jinpu.w...@ionos.com/
Cc: Jack Wang
Cc: Michael R. Galaxy
Cc: Peter Xu
Cc: Yu Zhang
Signed-off-by: Li Zhijian
---
This test is added based on [1]
S
57 matches
Mail list logo