Re: Problem with migration/rdma

2024-03-06 Thread Zhijian Li (Fujitsu)
Yu, On 07/03/2024 00:30, Philippe Mathieu-Daudé wrote: > Cc'ing RDMA migration reviewers/maintainers: > > $ ./scripts/get_maintainer.pl -f migration/rdma.c > Li Zhijian (reviewer:RDMA Migration) > Peter Xu (maintainer:Migration) > Fabiano Rosas (maintainer:Migration) > > On 5/3/24 22:32, Yu

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-03-28 Thread Zhijian Li (Fujitsu)
On 28/03/2024 23:01, Peter Xu wrote: > On Thu, Mar 28, 2024 at 11:18:04AM -0300, Fabiano Rosas wrote: >> Philippe Mathieu-Daudé writes: >> >>> The whole RDMA subsystem was deprecated in commit e9a54265f5 >>> ("hw/rdma: Deprecate the pvrdma device and the rdma subsystem") >>> released in v8.2. >>

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-01 Thread Zhijian Li (Fujitsu)
Phil, on 3/29/2024 6:28 PM, Philippe Mathieu-Daudé wrote: >> >> >>> IMHO it's more important to know whether there are still users and >>> whether >>> they would still like to see it around. >> >> Agree. >> I didn't immediately express my opinion in V1 because I'm also >> consulting our >> custo

Re: [PATCH 1/2] CXL/cxl_type3: add first_dvsec_offset() helper

2024-04-01 Thread Zhijian Li (Fujitsu)
On 02/04/2024 12:09, fan wrote: > On Tue, Apr 02, 2024 at 09:46:46AM +0800, Li Zhijian via wrote: >> It helps to figure out where the first dvsec register is located. In >> addition, replace offset and size hardcore with existing macros. >> >> Signed-off-by: Li Zhijian >> --- >> hw/mem/cxl_typ

Re: [PATCH 2/2] CXL/cxl_type3: reset DVSEC CXL Control in ct3d_reset

2024-04-02 Thread Zhijian Li (Fujitsu)
On 02/04/2024 17:17, Jonathan Cameron wrote: > On Tue, 2 Apr 2024 09:46:47 +0800 > Li Zhijian wrote: > >> After the kernel commit >> 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not >> match a CFMWS window") > > Fixes tag seems appropriate. > >> CXL type3 devices can

Re: [PATCH 2/2] CXL/cxl_type3: reset DVSEC CXL Control in ct3d_reset

2024-04-03 Thread Zhijian Li (Fujitsu)
On 03/04/2024 11:42, Li Zhijian wrote: > > > On 02/04/2024 17:17, Jonathan Cameron wrote: >> On Tue,  2 Apr 2024 09:46:47 +0800 >> Li Zhijian wrote: >> >>> After the kernel commit >>> 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not >>> match a CFMWS window") >> >> Fix

Re: [PATCH-for-9.1 v2 2/3] migration: Remove RDMA protocol handling

2024-04-09 Thread Zhijian Li (Fujitsu)
on 4/10/2024 3:46 AM, Peter Xu wrote: >> Is there document/link about the unittest/CI for migration tests, Why >> are those tests missing? >> Is it hard or very special to set up an environment for that? maybe we >> can help in this regards. > See tests/qtest/migration-test.c. We put most of ou

Re: Problem with migration/rdma

2024-03-07 Thread Zhijian Li (Fujitsu)
On 08/03/2024 14:55, Peter Xu wrote: > On Fri, Mar 08, 2024 at 07:27:59AM +0100, Yu Zhang wrote: >> Hello Zhijian and Peter, >> >> Thank you so much for testing and confirming it. >> I created a patch in the email format, unfortunately got an issue for >> setting up the >> "Application-specific P

Re: [PATCH] migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.

2024-04-16 Thread Zhijian Li (Fujitsu)
On 17/04/2024 10:44, Li Zhijian wrote: > bdrv_activate_all() should not be called from the coroutine context, move > it to the QEMU thread colo_process_incoming_thread() with the bql_lock > protected. > > The backtrace is as follows: > #4 0x561af7948362 in bdrv_graph_rdlock_main_loop () a

Re: [PATCH v2] migration/colo: Fix bdrv_graph_rdlock_main_loop: Assertion `!qemu_in_coroutine()' failed.

2024-04-17 Thread Zhijian Li (Fujitsu)
On 17/04/2024 14:13, Philippe Mathieu-Daudé wrote: > On 17/4/24 04:56, Li Zhijian via wrote: >> bdrv_activate_all() should not be called from the coroutine context, move >> it to the QEMU thread colo_process_incoming_thread() with the bql_lock >> protected. >> >> The backtrace is as follows: >>  

Re: [PATCH v2 1/1] migration/dirtyrate: Fix segmentation fault

2024-04-24 Thread Zhijian Li (Fujitsu)
On 24/04/2024 12:52, mii wrote: > > On 2024/04/24 10:28, Yong Huang wrote: >> >> >> On Tue, Apr 23, 2024 at 9:35 PM Peter Xu wrote: >> >> On Tue, Apr 23, 2024 at 09:13:08AM +, Masato Imai wrote: >> > When the KVM acceleration parameter is not set, executing >> calc_dirty_rate >>

Re: [PATCH v2] hw/mem/cxl_type3: reset dvsecs in ct3d_reset()

2024-04-25 Thread Zhijian Li (Fujitsu)
ping On 11/04/2024 18:18, Jonathan Cameron wrote: > On Tue, 9 Apr 2024 15:58:46 +0800 > Li Zhijian wrote: > >> After the kernel commit >> 0cab68720598 ("cxl/pci: Fix disabling memory if DVSEC CXL Range does not >> match a CFMWS window") >> CXL type3 devices cannot be enabled again after the

Re: [PATCH v3 1/1] accel/kvm: Fix segmentation fault

2024-05-06 Thread Zhijian Li (Fujitsu)
on 5/7/2024 10:50 AM, Masato Imai wrote: > When the KVM acceleration parameter is not set, executing calc_dirty_rate > with the -r or -b option results in a segmentation fault due to accessing > a null kvm_state pointer in the kvm_dirty_ring_enabled function. This > commit adds a null check for k

Re: [PATCH 2/2] docs/clx: Change to lowercase as others

2023-05-25 Thread Zhijian Li (Fujitsu)
On 25/05/2023 19:49, Jonathan Cameron via wrote: > On Fri, 19 May 2023 16:58:02 +0800 > Li Zhijian wrote: > >> Using the same style except the 'Topo' abbreviation. >> >> Signed-off-by: Li Zhijian >> --- >> I'm not a native speaker, feel free to correct me. > > I've edited slightly and applied

Re: [PATCH] cxl: Get rid of unused cfmw_list

2024-06-05 Thread Zhijian Li (Fujitsu)
On 05/06/2024 20:02, Jonathan Cameron wrote: > On Fri, 31 May 2024 14:13:17 +0800 > Li Zhijian wrote: > >> There is no user for this member. All '-M cxl-fmw.N' options have >> been parsed and saved to CXLState.fixed_windows. >> >> Signed-off-by: Li Zhijian > > Hi Li, > > Applied to my tree w

Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-05-13 Thread Zhijian Li (Fujitsu)
Hi Fan Do you have a newer instruction to play with the DCD. It seems that the instruction in RFC[0] doesn't work for current code. [0] https://lore.kernel.org/all/20230511175609.2091136-1-fan...@samsung.com/ On 19/04/2024 07:10, nifan@gmail.com wrote: > A git tree of this series can be f

Re: [PATCH v7 09/12] hw/cxl/events: Add qmp interfaces to add/release dynamic capacity extents

2024-05-13 Thread Zhijian Li (Fujitsu)
On 19/04/2024 07:11, nifan@gmail.com wrote: > +} else if (type == DC_EVENT_ADD_CAPACITY) { > +if (cxl_extents_overlaps_dpa_range(&dcd->dc.extents, dpa, len)) { > +error_setg(errp, > + "cannot add DPA already accessible to the same

Re: [PATCH v7 04/12] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices

2024-05-14 Thread Zhijian Li (Fujitsu)
On 19/04/2024 07:10, nifan@gmail.com wrote: > From: Fan Ni > > +} > + > static bool cxl_setup_memory(CXLType3Dev *ct3d, Error **errp) > { > DeviceState *ds = DEVICE(ct3d); > @@ -635,6 +676,13 @@ static bool cxl_setup_memory(CXLType3Dev *ct3d, Error > **errp) > g_free(p

Re: [PATCH v7 06/12] hw/mem/cxl_type3: Add host backend and address space handling for DC regions

2024-05-14 Thread Zhijian Li (Fujitsu)
On 19/04/2024 07:10, nifan@gmail.com wrote: > +uint64_t dc_size; > + > +mr = host_memory_backend_get_memory(ct3d->dc.host_dc); > +dc_size = memory_region_size(mr); > +region_len = DIV_ROUND_UP(dc_size, ct3d->dc.num_regions); > + > +if (dc_size % (ct3d->dc.num_regions * CXL

Re: [PATCH 2/3] migration/colo: make colo_incoming_co() return void

2024-05-15 Thread Zhijian Li (Fujitsu)
On 16/05/2024 03:04, Fabiano Rosas wrote: > Li Zhijian via writes: > >> Currently, it always returns 0, no need to check the return value at all. >> In addition, enter colo coroutine only if migration_incoming_colo_enabled() >> is true. >> Once the destination side enters the COLO* state, the C

Re: [PATCH v7 00/12] Enabling DCD emulation support in Qemu

2024-05-16 Thread Zhijian Li (Fujitsu)
98911] RBP: 7ffebec70bd0 R08: R09: 7ffebec70640 [ 127.000879] R10: R11: 0246 R12: 00403840 [ 127.003572] R13: R14: R15: [ 127.005543] Thanks Zhijian On 17/05/2024 01:12, fan wrote: > On Tu

Re: [PATCH v8 06/14] hw/mem/cxl_type3: Add support to create DC regions to type3 memory devices

2024-05-27 Thread Zhijian Li (Fujitsu)
On 24/05/2024 01:44, nifan@gmail.com wrote: > From: Fan Ni > > With the change, when setting up memory for type3 memory device, we can > create DC regions. > A property 'num-dc-regions' is added to ct3_props to allow users to pass the > number of DC regions to create. To make it easier, oth

Re: [PATCH 0/6] refactor RDMA live migration based on rsocket API

2024-06-06 Thread Zhijian Li (Fujitsu)
On 06/06/2024 19:31, Leon Romanovsky wrote: > On Wed, Jun 05, 2024 at 10:00:24AM +, Gonglei (Arei) wrote: >> >> >>> -Original Message- >>> From: Michael S. Tsirkin [mailto:m...@redhat.com] >>> Sent: Wednesday, June 5, 2024 3:57 PM >>> To: Gonglei (Arei) >>> Cc: qemu-devel@nongnu.org;

Re: [PATCH v2 02/10] migration: Rename thread debug names

2024-06-18 Thread Zhijian Li (Fujitsu)
On 18/06/2024 02:15, Peter Xu wrote: > The postcopy thread names on dest QEMU are slightly confusing, partly I'll > need to blame myself on 36f62f11e4 ("migration: Postcopy preemption > preparation on channel creation"). E.g., "fault-fast" reads like a fast > version of "fault-default", but it's

Re: [PATCH] migration: Remove unused VMSTATE_ARRAY_TEST() macro

2024-06-21 Thread Zhijian Li (Fujitsu)
On 21/06/2024 15:03, Philippe Mathieu-Daudé wrote: > Last use of VMSTATE_ARRAY_TEST() was removed in commit 46baa9007f > ("migration/i386: Remove old non-softfloat 64bit FP support"), we > can safely get rid of it. > > Signed-off-by: Philippe Mathieu-Daudé Reviewed-by: Li Zhijian > --- >

Re: [PATCH 34/52] migration/rdma: Convert qemu_rdma_exchange_recv() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 35/52] migration/rdma: Convert qemu_rdma_exchange_send() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 36/52] migration/rdma: Convert qemu_rdma_exchange_get_response() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 37/52] migration/rdma: Convert qemu_rdma_reg_whole_ram_blocks() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 38/52] migration/rdma: Convert qemu_rdma_write_flush() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 40/52] migration/rdma: Convert qemu_rdma_write() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Just for consistency with qemu_rdma_write_one() and > qemu_rdma_write_flush(), and for slightly simpler code. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 41/52] migration/rdma: Convert qemu_rdma_post_send_control() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 42/52] migration/rdma: Convert qemu_rdma_post_recv_control() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Just for symmetry with qemu_rdma_post_send_control(). Error messages > lose detail I consider of no use to users. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 44/52] migration/rdma: Silence qemu_rdma_resolve_host()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 26/09/2023 13:50, Li Zhijian wrote: > > > On 18/09/2023 22:41, Markus Armbruster wrote: >> Functions that use an Error **errp parameter to return errors should >> not also report them to the user, because reporting is the caller's >> job.  When the caller does, the error is reported twice. 

Re: [PATCH 45/52] migration/rdma: Silence qemu_rdma_connect()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 46/52] migration/rdma: Silence qemu_rdma_reg_control()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 47/52] migration/rdma: Don't report received completion events as error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > When qemu_rdma_wait_comp_channel() receives an event from the > completion channel, it reports an error "receive cm event while wait > comp channel,cm event is T", where T is the numeric event type. > However, the function fails only when T is a dis

Re: [PATCH 48/52] migration/rdma: Silence qemu_rdma_block_for_wrid()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 49/52] migration/rdma: Silence qemu_rdma_register_and_get_keys()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 51/52] migration/rdma: Use error_report() & friends instead of stderr

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > if (local->nb_blocks != nb_dest_blocks) { > -fprintf(stderr, "ram blocks mismatch (Number of blocks %d vs %d) > " > -"Your QEMU command line parameters are probably " > -"not identical o

Re: [PATCH 52/52] migration/rdma: Fix how we show device details on open

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > qemu_rdma_dump_id() dumps RDMA device details to stdout. > > rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() > and qemu_rdma_resolve_host() to show source device details. > rdma_start_incoming_migration() arranges its call via

Re: [PATCH 43/52] migration/rdma: Convert qemu_rdma_alloc_pd_cq() to Error

2023-09-25 Thread Zhijian Li (Fujitsu)
On 26/09/2023 14:41, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because repo

Re: [PATCH] MAINTAINERS: Add entry for rdma migration

2023-09-26 Thread Zhijian Li (Fujitsu)
o Zhijian can see the patches and review when he still has > the bandwidth. Feel free to add my Acked tag. thanks. Acked-by: Li Zhijian > > Cc: Daniel P. Berrangé > Cc: Juan Quintela > Cc: Markus Armbruster > Cc: Zhijian Li (Fujitsu) > Cc: Fabiano Rosas > Signed-of

Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:42, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 24/52] migration/rdma: Return -1 instead of negative errno code

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Several functions return negative errno codes on failure. Callers > check for specific codes exactly never. For some of the functions, > callers couldn't check even if they wanted to, because the functions > also return negative values that aren't

Re: [PATCH 25/52] migration/rdma: Dumb down remaining int error values to -1

2023-09-26 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > This is just to make the error value more obvious. Callers don't > mind. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored

2023-09-26 Thread Zhijian Li (Fujitsu)
On 25/09/2023 15:09, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> All we do with the value of RDMAContext member @error_state is test >>> whether it's zero. Change to bool

Re: [PATCH 50/52] migration/rdma: Silence qemu_rdma_cleanup()

2023-09-26 Thread Zhijian Li (Fujitsu)
On 26/09/2023 19:52, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:42, Markus Armbruster wrote: >>> Functions that use an Error **errp parameter to return errors should >>> not also report them to the user, because repo

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-14 Thread Zhijian Li (Fujitsu)
On 15/09/2023 04:20, “William Roche wrote: > From: William Roche > > A memory page poisoned from the hypervisor level is no longer readable. > Thus, it is now treated as a zero-page for the ram saving migration phase. > > The migration of a VM will crash Qemu when it tries to read the > memory

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-17 Thread Zhijian Li (Fujitsu)
On 15/09/2023 19:31, William Roche wrote: > On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: >> >> >> I'm okay with "RDMA isn't touched". >> BTW, could you share your reproducing program/hacking to poison the page, so >> that >> i am able

Re: [PATCH 05/52] migration/rdma: Consistently use uint64_t for work request IDs

2023-09-18 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > We use int instead of uint64_t in a few places. Change them to > uint64_t. > > This cleans up a comparison of signed qemu_rdma_block_for_wrid() > parameter @wrid_requested with unsigned @wr_id. Harmless, because the > actual arguments are non-neg

Re: [PATCH v2 1/1] migration: skip poisoned memory pages on "ram saving" phase

2023-09-20 Thread Zhijian Li (Fujitsu)
On 15/09/2023 19:31, William Roche wrote: > On 9/15/23 05:13, Zhijian Li (Fujitsu) wrote: >> >> >> I'm okay with "RDMA isn't touched". >> BTW, could you share your reproducing program/hacking to poison the page, so >> that >> i am able

Re: [PATCH 2/2] migration/rdma: zore out head.repeat to make the error more clear

2023-09-20 Thread Zhijian Li (Fujitsu)
On 20/09/2023 21:01, Fabiano Rosas wrote: > Li Zhijian writes: > >> From: Li Zhijian >> >> Previously, we got a confusion error that complains >> the RDMAControlHeader.repeat: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> Actually, it's caused b

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-20 Thread Zhijian Li (Fujitsu)
Sorry to all, i forgot to update my email address to lizhij...@fujitsu.com. Corrected it. On 20/09/2023 17:04, Li Zhijian wrote: > From: Li Zhijian > > Destination will fail with: > qemu-system-x86_64: rdma: Too many requests in this message > (3638950032).Bailing. > > migrate with RDMA is d

Re: [PATCH v2 2/7] migration: Clean up local variable shadowing

2023-09-20 Thread Zhijian Li (Fujitsu)
On 21/09/2023 02:31, Markus Armbruster wrote: > Local variables shadowing other local variables or parameters make the > code needlessly hard to understand. Tracked down with -Wshadow=local. > Clean up: delete inner declarations when they are actually redundant, > else rename variables. > > Sig

Re: [PATCH 00/52] migration/rdma: Error handling fixes

2023-09-21 Thread Zhijian Li (Fujitsu)
Perter, On 20/09/2023 00:49, Peter Xu wrote: > On Mon, Sep 18, 2023 at 04:41:14PM +0200, Markus Armbruster wrote: >> Oh dear, where to start. There's so much wrong, and in pretty obvious >> ways. This code should never have passed review. I'm refraining from >> saying more; see the commit mess

Re: [PATCH 01/52] migration/rdma: Clean up qemu_rdma_poll()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_poll()'s return type is uint64_t, even though it returns 0, > -1, or @ret, which is int. Its callers assign the return value to int > variables, then check whether it's negative. Unclean. > > Return int instead. > > Signed-off-by: Mark

Re: [PATCH 02/52] migration/rdma: Clean up qemu_rdma_data_init()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_data_init() return type is void *. It actually returns > RDMAContext *, and all its callers assign the value to an > RDMAContext *. Unclean. > > Return RDMAContext * instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijia

Re: [PATCH 03/52] migration/rdma: Clean up rdma_delete_block()'s return type

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_delete_block() always returns 0, which its only caller ignores. > Return void instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 04/52] migration/rdma: Drop fragile wr_id formatting

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > wrid_desc[] uses 4001 pointers to map four integer values to strings. > > print_wrid() accesses wrid_desc[] out of bounds when passed a negative > argument. It returns null for values 2..1999 and 2001..3999. > > qemu_rdma_poll() and qemu_rdma_blo

Re: [PATCH 07/52] migration/rdma: Give qio_channel_rdma_source_funcs internal linkage

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 08/52] migration/rdma: Fix qemu_rdma_accept() to return failure on errors

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_accept() returns 0 in some cases even when it didn't > complete its job due to errors. Impact is not obvious. I figure the > caller will soon fail again with a misleading error message. > > Fix it to return -1 on any failure. > > Signe

Re: [PATCH 09/52] migration/rdma: Put @errp parameter last

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > include/qapi/error.h demands: > > * - Functions that use Error to report errors have an Error **errp > * parameter. It should be the last parameter, except for functions > * taking variable arguments. > > qemu_rdma_connect() does not co

Re: [PATCH 10/52] migration/rdma: Eliminate error_propagate()

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When all we do with an Error we receive into a local variable is > propagating to somewhere else, we can just as well receive it there > right away. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 19 +

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-21 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_add_block() can't fail. Return void, and drop the unreachable > error handling. > > Signed-off-by: Markus Armbruster > --- > migration/rdma.c | 30 +- > 1 file changed, 9 insertions(+), 21 deletions(-) > [...]

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-22 Thread Zhijian Li (Fujitsu)
On 20/09/2023 20:46, Fabiano Rosas wrote: > Li Zhijian writes: > >> From: Li Zhijian >> >> Destination will fail with: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migrate with RDMA is different from tcp. RDMA has its own control >> message, an

Re: [PATCH 11/52] migration/rdma: Drop rdma_add_block() error handling

2023-09-22 Thread Zhijian Li (Fujitsu)
On 21/09/2023 19:15, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> rdma_add_block() can't fail. Return void, and drop the unreachable >>> error handling. >>> >>> S

Re: [PATCH 12/52] migration/rdma: Drop qemu_rdma_search_ram_block() error handling

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_search_ram_block() can't fail. Return void, and drop the > unreachable error handling. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 13/52] migration/rdma: Make qemu_rdma_buffer_mergable() return bool

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_buffer_mergable() is semantically a predicate. It returns > int 0 or 1. Return bool instead. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian

Re: [PATCH 14/52] migration/rdma: Use bool for two RDMAContext flags

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > @error_reported and @received_error are flags. The latter is even > assigned bool true. Change them from int to bool. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 6 +++--- > 1 file changed, 3 ins

Re: [PATCH 15/52] migration/rdma: Ditch useless numeric error codes in error messages

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Several error messages include numeric error codes returned by failed > functions: > > * ibv_poll_cq() returns an unspecified negative value. Useless. > > * rdma_accept and rmda_get_cm_event() return -1. Useless. s/rmda_get_cm_event/rdma_get_c

Re: [PATCH 16/52] migration/rdma: Fix io_writev(), io_readv() methods to obey contract

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > QIOChannelClass methods qio_channel_rdma_readv() and > qio_channel_rdma_writev() violate their method contract when > rdma->error_state is non-zero: > > 1. They return whatever is in rdma->error_state then. Only -1 will be > fine. -2 will be

Re: [PATCH 17/52] migration/rdma: Replace dangerous macro CHECK_ERROR_STATE()

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Hiding return statements in macros is a bad idea. Use a function > instead, and open code the return part. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 43 +++---

Re: [PATCH 18/52] migration/rdma: Fix qemu_rdma_broken_ipv6_kernel() to set error

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_resolve_host() and qemu_rdma_dest_init() try addresses until > they find on that works. If none works, they return the first Error > set by qemu_rdma_broken_ipv6_kernel(), or else return a generic one. > > qemu_rdma_broken_ipv6_kernel()

Re: [PATCH 19/52] migration/rdma: Fix qemu_get_cm_event_timeout() to always set error

2023-09-22 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_get_cm_event_timeout() neglects to set an error when it fails > because rdma_get_cm_event() fails. Harmless, as its caller > qemu_rdma_connect() substitutes a generic error then. Fix it anyway. > > qemu_rdma_connect() also sets the generic e

Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > The QEMUFileHooks methods don't come with a written contract. Digging > through the code calling them, we find: > > * save_page(): I'm fine with this > >Negative values RAM_SAVE_CONTROL_DELAYED and >RAM_SAVE_CONTROL_NOT_SUPP are special

Re: [PATCH 22/52] migration/rdma: Fix rdma_getaddrinfo() error checking

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > rdma_getaddrinfo() returns 0 on success. On error, it returns one of > the EAI_ error codes like getaddrinfo() does, Good catch, It used to be -1 on error, rdma_getaddrinfo(3) updated it 2021. or -1 with errno set. > This is broken by design

Re: [PATCH 23/52] migration/rdma: Clean up qemu_rdma_wait_comp_channel()'s error value

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_wait_comp_channel() returns 0 on success, and either -1 or > rdma->error_state on failure. Callers actually expect a negative > error value. I don't see the only one callers expect a negative error code. migration/rdma.c:1654:re

Re: [PATCH 26/52] migration/rdma: Replace int error_state by bool errored

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > All we do with the value of RDMAContext member @error_state is test > whether it's zero. Change to bool and rename to @errored. > make sense! Reviewed-by: Li Zhijian Can we move this patch ahead "[PATCH 23/52] migration/rdma: Clean up qemu_rd

Re: [PATCH 27/52] migration/rdma: Drop superfluous assignments to @ret

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 35 ++- > 1 file changed, 10 insertions(+), 25 deletions(-) > > diff --git a/migration/rdma.c b/migration/rdma.c > index 85f

Re: [PATCH 28/52] migration/rdma: Check negative error values the same way everywhere

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When a function returns 0 on success, negative value on error, > checking for non-zero suffices, but checking for negative is clearer. > So do that. > This patch is no my favor convention. @Peter, Juan I'd like to hear your opinions. Thanks Zhi

Re: [PATCH 29/52] migration/rdma: Plug a memory leak and improve a message

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > When migration capability @rdma-pin-all is true, but the server cannot > honor it, qemu_rdma_connect() calls macro ERROR(), then returns > success. > > ERROR() sets an error. Since qemu_rdma_connect() returns success, its > caller rdma_start_outgo

Re: [PATCH 30/52] migration/rdma: Delete inappropriate error_report() in macro ERROR()

2023-09-24 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH 21/52] migration/rdma: Fix QEMUFileHooks method return values

2023-09-25 Thread Zhijian Li (Fujitsu)
On 25/09/2023 14:36, Markus Armbruster wrote: > "Zhijian Li (Fujitsu)" writes: > >> On 18/09/2023 22:41, Markus Armbruster wrote: >>> The QEMUFileHooks methods don't come with a written contract. Digging >>> through the code calling them, we fin

Re: [PATCH 32/52] migration/rdma: Fix error handling around rdma_getaddrinfo()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > qemu_rdma_resolve_host() and qemu_rdma_dest_init() iterate over > addresses to find one that works, holding onto the first Error from > qemu_rdma_broken_ipv6_kernel() for use when no address works. Issues: > > 1. If @errp was &error_abort or &erro

Re: [PATCH 31/52] migration/rdma: Retire macro ERROR()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > ERROR() has become "error_setg() unless an error has been set > already". Hiding the conditional in the macro is in the way of > further work. Replace the macro uses by their expansion, and delete > the macro. > > Signed-off-by: Markus Armbruster

Re: [PATCH 33/52] migration/rdma: Drop "@errp is clear" guards around error_setg()

2023-09-25 Thread Zhijian Li (Fujitsu)
On 18/09/2023 22:41, Markus Armbruster wrote: > These guards are all redundant now. > > Signed-off-by: Markus Armbruster Reviewed-by: Li Zhijian > --- > migration/rdma.c | 164 +++ > 1 file changed, 51 insertions(+), 113 deletions(-) > > diff

Re: [PATCH 00/52] migration/rdma: Error handling fixes

2023-09-25 Thread Zhijian Li (Fujitsu)
On 22/09/2023 23:21, Peter Xu wrote: > On Thu, Sep 21, 2023 at 08:27:24AM +0000, Zhijian Li (Fujitsu) wrote: >> I'm worried that I may not have enough time, ability, or environment to >> review/test >> the RDMA patches. but for this patch set, i will take a look late

Re: [PATCH 1/2] migration: Fix rdma migration failed

2023-09-25 Thread Zhijian Li (Fujitsu)
On 22/09/2023 23:42, Peter Xu wrote: > On Wed, Sep 20, 2023 at 05:04:11PM +0800, Li Zhijian wrote: >> From: Li Zhijian >> >> Destination will fail with: >> qemu-system-x86_64: rdma: Too many requests in this message >> (3638950032).Bailing. >> >> migrate with RDMA is different from tcp. RDMA ha

Re: [PATCH v2 06/53] migration/rdma: Fix unwanted integer truncation

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qio_channel_rdma_readv() assigns the size_t value of qemu_rdma_fill() > to an int variable before it adds it to @done / subtracts it from > @want, both size_t. Truncation when qemu_rdma_fill() copies more than > INT_MAX bytes. Seems vanishingly un

Re: [PATCH v2 07/53] migration/rdma: Clean up two more harmless signed vs. unsigned issues

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > qemu_rdma_exchange_get_response() compares int parameter @expecting > with uint32_t head->type. Actual arguments are non-negative > enumeration constants, RDMAControlHeader uint32_t member type, or > qemu_rdma_exchange_recv() int parameter expectin

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

2023-10-06 Thread Zhijian Li (Fujitsu)
+rdma-core Is global variable *errno* reliable when the documentation only states "returns 0 on success, or the value of errno on failure (which indicates the failure reason)." Someone read it as "assign error code to errno and return it.", I used to think the same way. but ibv_post_send() doe

Re: [PATCH v2 51/53] migration/rdma: Downgrade qemu_rdma_cleanup() errors to warnings

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > Functions that use an Error **errp parameter to return errors should > not also report them to the user, because reporting is the caller's > job. When the caller does, the error is reported twice. When it > doesn't (because it recovered from the e

Re: [PATCH v2 52/53] migration/rdma: Use error_report() & friends instead of stderr

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > error_report() obeys -msg, reports the current error location if any, > and reports to the current monitor if any. Reporting to stderr > directly with fprintf() or perror() is wrong, because it loses all > this. > > Fix the offenders. Bonus: reso

Re: [PATCH v2 53/53] migration/rdma: Replace flawed device detail dump by tracing

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:20, Markus Armbruster wrote: > qemu_rdma_dump_id() dumps RDMA device details to stdout. > > rdma_start_outgoing_migration() calls it via qemu_rdma_source_init() > and qemu_rdma_resolve_host() to show source device details. > rdma_start_incoming_migration() arranges its call via

Re: [PATCH v2 16/53] migration/rdma: Fix or document problematic uses of errno

2023-10-06 Thread Zhijian Li (Fujitsu)
On 28/09/2023 21:19, Markus Armbruster wrote: > We use errno after calling Libibverbs functions that are not > documented to set errno (manual page does not mention errno), or where > the documentation is unclear ("returns [...] the value of errno on > failure"). While this could be read as "set

  1   2   >