xleoken commented on code in PR #6591:
URL: https://github.com/apache/hadoop/pull/6591#discussion_r1508372843
##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##########
@@ -1182,10 +1182,12 @@ public void run() {
if (begin != null) {
long duration = Time.monotonicNowNanos() - begin;
if (TimeUnit.NANOSECONDS.toMillis(duration) >
dfsclientSlowLogThresholdMs) {
- LOG.info("Slow ReadProcessor read fields for block " + block
+ final String msg = "Slow ReadProcessor read fields for block "
+ block
+ " took " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms
(threshold="
+ dfsclientSlowLogThresholdMs + "ms); ack: " + ack
- + ", targets: " + Arrays.asList(targets));
+ + ", targets: " + Arrays.asList(targets);
+ LOG.warn(msg);
+ throw new IOException(msg);
Review Comment:
@ZanderXu
> How to identify this case
When the client takes more time to read ack than
`dfsclientSlowLogThresholdMs`.
> Which datanode should be marked as a bad or slow DN
When some datanodes in poor network environment.
> Maybe Datastreamer can identify this case and recovery it through
PipelineRecovery
The core issue is that the response time between the client and DN is
greater than `dfsclientSlowLogThresholdMs`, but only print a log without taking
any action. We should print the log and throw an `IOException`.
> but I don't think your modification is a good solution.
Maybe you're right, but this may be the simplest modification. After this
patch, we solved the slow dn problem in production environment.
##########
hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DataStreamer.java:
##########
@@ -1182,10 +1182,12 @@ public void run() {
if (begin != null) {
long duration = Time.monotonicNowNanos() - begin;
if (TimeUnit.NANOSECONDS.toMillis(duration) >
dfsclientSlowLogThresholdMs) {
- LOG.info("Slow ReadProcessor read fields for block " + block
+ final String msg = "Slow ReadProcessor read fields for block "
+ block
+ " took " + TimeUnit.NANOSECONDS.toMillis(duration) + "ms
(threshold="
+ dfsclientSlowLogThresholdMs + "ms); ack: " + ack
- + ", targets: " + Arrays.asList(targets));
+ + ", targets: " + Arrays.asList(targets);
+ LOG.warn(msg);
+ throw new IOException(msg);
Review Comment:
Welcome @ZanderXu
> How to identify this case
When the client takes more time to read ack than
`dfsclientSlowLogThresholdMs`.
> Which datanode should be marked as a bad or slow DN
When some datanodes in poor network environment.
> Maybe Datastreamer can identify this case and recovery it through
PipelineRecovery
The core issue is that the response time between the client and DN is
greater than `dfsclientSlowLogThresholdMs`, but only print a log without taking
any action. We should print the log and throw an `IOException`.
> but I don't think your modification is a good solution.
Maybe you're right, but this may be the simplest modification. After this
patch, we solved the slow dn problem in production environment.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]