Hexiaoqiao commented on PR #6591: URL: https://github.com/apache/hadoop/pull/6591#issuecomment-2017607444
@xleoken Thanks for your proposal. I am not sure this is the proper solution for your case as @ZanderXu mentioned. IIUC, you expect to fast fail when meet network issue between client and the first DataNode while write data to pipeline, right? IMO, it is difficult to determine to do that because, a. Sometimes we could not determine that it is client to the first DataNode network issue only using the time cost of ACK. Timeout between DataNodes in pipeline could also lead client wait time out IMO. b. If fast fail and recovery pipeline, the time cost could be more considerable, such as re-create pipeline and transfer data will involve more time cost when have writen out more than 10MB. For this case, we have discussed times, I think we need to split it to two step, report metrics back to client, then improve strategy (fast fail or switch dn or some other way based on different metrics). FYI. Thanks again. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
