zhixinwen commented on code in PR #3077:
URL: https://github.com/apache/kvrocks/pull/3077#discussion_r2379954778
##########
src/cluster/batch_sender.cc:
##########
@@ -100,7 +100,7 @@ Status BatchSender::sendApplyBatchCmd(int fd, const
rocksdb::WriteBatch &write_b
GET_OR_RET(util::SockSend(fd, redis::ArrayOfBulkStrings({"APPLYBATCH",
write_batch.Data()})));
- std::string line = GET_OR_RET(util::SockReadLine(fd));
+ std::string line = GET_OR_RET(util::SockReadLineWithRetry(fd, 10, 500));
Review Comment:
Added some log for debugging:
```
GET_OR_RET(util::SockSend(fd, redis::ArrayOfBulkStrings({"APPLYBATCH",
write_batch.Data()})));
// INSERT_YOUR_CODE
// Log the SO_RCVTIMEO (receive timeout) for fd
struct timeval tv;
socklen_t tv_len = sizeof(tv);
if (getsockopt(fd, SOL_SOCKET, SO_RCVTIMEO, &tv, &tv_len) == 0) {
LOG(INFO) << "[migrate] fd " << fd << " SO_RCVTIMEO: " << tv.tv_sec <<
"s " << tv.tv_usec << "us";
} else {
LOG(WARNING) << "[migrate] Failed to get SO_RCVTIMEO for fd " << fd <<
": " << strerror(errno);
}
std::string line = GET_OR_RET(util::SockReadLine(fd));
```
and I get `[2025-09-25T18:08:58.993539+00:00][I][batch_sender.cc:107]
[migrate] fd 5488 SO_RCVTIMEO: 1s 0us`.
The 1s timeout is set because it was set in`checkMultipleResponses` which
later affects `APPLYBATCH`. The `Resource temporarily unavailable` error is due
to the timeout.
I think fixing compaction is the best way to go, but we should think about
how to retry failure in general.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]