rdma: Convert qemu_rdma_write_one() to Error

Markus Armbruster Wed, 27 Sep 2023 23:50:29 -0700

Let me try to map solutions.

Markus Armbruster <arm...@redhat.com> writes:


> migration/rdma.c uses errno directly or via perror() after the following
> functions:
>
> * poll()
>
>   POSIX specifies errno is set on error.  Good.

Nothing wrong, nothing to do.

> * rdma_get_cm_event(), rdma_connect(), rdma_get_cm_event()
>
>   Manual page promises "if an error occurs, errno will be set".  Good.

Nothing wrong, nothing to do.

> * ibv_open_device()
>
>   Manual page does not mention errno.  Using it seems ill-advised.
>
>   qemu_rdma_broken_ipv6_kernel() recovers from EPERM by trying the next
>   device.  Wrong if ibv_open_device() doesn't actually set errno.
>
>   What is to be done here?

1. Investigate whether ibv_open_device() sets errno.  I can't spare time
   for that.

2. Add a comment pointing out the problem, in the hope somebody
   investigates later.

3. Do nothing.

> * ibv_reg_mr()
>
>   Manual page does not mention errno.  Using it seems ill-advised.
>
>   qemu_rdma_reg_whole_ram_blocks() and qemu_rdma_register_and_get_keys()
>   recover from errno = ENOTSUP by retrying with modified @access
>   argument.  Wrong if ibv_reg_mr() doesn't actually set errno.
>
>   What is to be done here?

Likewise.

> * ibv_get_cq_event()
>
>   Manual page does not mention errno.  Using it seems ill-advised.
>
>   qemu_rdma_block_for_wrid() calls perror().  Removed in PATCH 48.  Good
>   enough.

1. Add a comment pointing out the problem, remove it in PATCH 48.

2. Nothing wrong after the series, nothing to do.

> * ibv_post_send()
>
>   Manual page has the function return "the value of errno on failure".
>   Sounds like it sets errno to the value it returns.  However, the
>   rdma-core repository defines it as
>
>     static inline int ibv_post_send(struct ibv_qp *qp, struct ibv_send_wr *wr,
>                                     struct ibv_send_wr **bad_wr)
>     {
>             return qp->context->ops.post_send(qp, wr, bad_wr);
>     }
>
>   and at least one of the methods fails without setting errno:
>
>     static int mana_post_send(struct ibv_qp *ibqp, struct ibv_send_wr *wr,
>                               struct ibv_send_wr **bad)
>     {
>             /* This version of driver supports RAW QP only.
>              * Posting WR is done directly in the application.
>              */
>             return EOPNOTSUPP;
>     }
>
>   qemu_rdma_write_one() calls perror().  PATCH 39 (this one) replaces it
>   by error_setg(), not error_setg_errno().  Seems prudent, but should be
>   called out in the commit message.

1. Add a comment pointing out the problem, remove it in PATCH 39.

2. Pass @ret, not @errno to error_setg_errno().

3. Nothing wrong after the series, nothing to do.

Since 2. is easy, let's do it.

> * ibv_advise_mr()
>
>   Manual page has the function return "the value of errno on failure".
>   Sounds like it sets errno to the value it returns, but my findings for
>   ibv_post_send() make me doubt it.
>
>   qemu_rdma_advise_prefetch_mr() traces strerror(errno).  Could be
>   misleading.  Drop that part?

1. Change sterror(errno) to strerror(ret)

2. Drop strerror(errno)

3. Do nothing.

Since 1. is easy, let's do it.

> * ibv_dereg_mr()
>
>   Manual page has the function return "the value of errno on failure".
>   Sounds like it sets errno to the value it returns, but my findings for
>   ibv_post_send() make me doubt it.
>
>   qemu_rdma_unregister_waiting() calls perror().  Removed in PATCH 51.
>   Good enough.

1. Add a comment pointing out the problem, remove it in PATCH 51.

2. Nothing wrong after the series, nothing to do.

> * qemu_get_cm_event_timeout()
>
>   Can fail without setting errno.
>
>   qemu_rdma_connect() calls perror().  Removed in PATCH 45.  Good
>   enough.
>
> Thoughts?

Considering all of the above...  I'd like to stick a patch documenting
problematic errno use early in the series, and fix all the easy ones
later in the series, leaving just the two difficult ones in
qemu_rdma_broken_ipv6_kernel() and qemu_rdma_reg_whole_ram_blocks().

Makes sense?

> [...]
>
> [*] https://github.com/linux-rdma/rdma-core.git
>     commit 55fa316b4b18f258d8ac1ceb4aa5a7a35b094dcf

Re: [PATCH 39/52] migration/rdma: Convert qemu_rdma_write_one() to Error

Reply via email to