Backport from user space. The bug is unlikely to see in kernel and we have never observe it. Yet, it is better to keep this place clean.
Cloning comment from corresponsing user space commit: We used to have an ugly problem there, when ->done is called from IO backend which is not aware about rpc logic it considered all the errors as local. The damage at least on client->cs rpc users is severe, local errors are considered not inflicted by failing cluster neighbors, but blamed on local host, so that cluster recovery process is not triggered. It exposed itself big time on KRPC (in user space), but it can be dangerous for plain sock/rdma backends too. The bug is ancient, it was present since day zero, but we never noticed it, because error of this kind are very rare with TCP: write is non-blocking as rule, we must have filled sndbuf when the socket aborts and to lose error we should not have any uncompleted rpc requests, as error for them will trigger correct path. Not easy to see the bug, yet possible. Let's not overthink this, it is enough to fix the issue in client-cs path, which can be done with special kludge in cs_sent. NOTE: special ugly exception for PCS_ERR_NOMEM is inherited from older pcs_set_rpc_error which in fact serves the same function in rpc context. Affects: #VSTOR-100586 Signed-off-by: Alexey Kuznetsov <kuz...@virtuozzo.com> --- fs/fuse/kio/pcs/pcs_cs.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/fs/fuse/kio/pcs/pcs_cs.c b/fs/fuse/kio/pcs/pcs_cs.c index 0707575..ad398ac 100644 --- a/fs/fuse/kio/pcs/pcs_cs.c +++ b/fs/fuse/kio/pcs/pcs_cs.c @@ -551,6 +551,10 @@ static void cs_sent(struct pcs_msg *msg) { msg->done = cs_response_done; if (pcs_if_error(&msg->error)) { + if (msg->rpc && !msg->error.remote && msg->error.value != PCS_ERR_NOMEM) { + msg->error.remote = 1; + msg->error.offender = msg->rpc->peer_id; + } msg->done(msg); return; } -- 1.8.3.1 _______________________________________________ Devel mailing list Devel@openvz.org https://lists.openvz.org/mailman/listinfo/devel