[jira] [Resolved] (HDFS-16888) BlockManager#maxReplicationStreams, replicationStreamsHardLimit, blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be volatile

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16888.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout should be 
> volatile
> 
>
> Key: HDFS-16888
> URL: https://issues.apache.org/jira/browse/HDFS-16888
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Haiyang Hu
>Assignee: Haiyang Hu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> BlockManager#maxReplicationStreams, replicationStreamsHardLimit, 
> blocksReplWorkMultiplier and PendingReconstructionBlocks#timeout these 
> variables may be  writen by NameNode#reconfReplicationParameters then while 
> read by the other threads. 
> Thus they should be declared as volatile to make sure the "happens-before" 
> consistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16900) Method DataNode#isWrite seems not working in DataTransfer constructor method

2023-01-31 Thread ZhangHB (Jira)
ZhangHB created HDFS-16900:
--

 Summary: Method DataNode#isWrite seems not working in DataTransfer 
constructor method
 Key: HDFS-16900
 URL: https://issues.apache.org/jira/browse/HDFS-16900
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 3.3.4
Reporter: ZhangHB


In constructor method of DataTransfer, there is codes below:
{code:java}
if (isTransfer(stage, clientname)) {
  this.throttler = xserver.getTransferThrottler();
} else if(isWrite(stage)) {
  this.throttler = xserver.getWriteThrottler();
} {code}
the stage is a parameter of DataTransfer Constructor. Let us see where 
instantiate DataTransfer object.

In method transferReplicaForPipelineRecovery, codes like below:
{code:java}
final DataTransfer dataTransferTask = new DataTransfer(targets,
targetStorageTypes, targetStorageIds, b, stage, client); {code}
but the stage can never be PIPELINE_SETUP_STREAMING_RECOVERY or 
PIPELINE_SETUP_APPEND_RECOVERY.

It can only be TRANSFER_RBW or TRANSFER_FINALIZED.  So I think the method 
isWrite is not working.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86_64

2023-01-31 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1122/

[Jan 30, 2023, 5:17:04 PM] (github) HADOOP-18584. [NFS GW] Fix regression after 
netty4 migration. (#5252)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-mapreduce-project/hadoop-mapreduce-client 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   
module:hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core
 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   module:hadoop-mapreduce-project 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

spotbugs :

   module:root 
   Write to static field 
org.apache.hadoop.mapreduce.task.reduce.Fetcher.nextId from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:from instance method new 
org.apache.hadoop.mapreduce.task.reduce.Fetcher(JobConf, TaskAttemptID, 
ShuffleSchedulerImpl, MergeManager, Reporter, ShuffleClientMetrics, 
ExceptionReporter, SecretKey) At Fetcher.java:[line 120] 

Failed junit tests :

   hadoop.hdfs.TestLeaseRecovery2 
   hadoop.mapreduce.v2.hs.TestJobHistoryParsing 
   hadoop.mapreduce.v2.hs.TestJobHistoryEvents 
   hadoop.mapreduce.v2.hs.TestJobHistoryServer 
   hadoop.mapreduce.v2.TestSpeculativeExecutionWithMRApp 
  

   cc:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1122/artifact/out/results-compile-cc-root.txt
 [96K]

   javac:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1122/artifact/out/results-compile-javac-root.txt
 [528K]

   blanks:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1122/artifact/out/blanks-eol.txt
 [14M]
  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1122/artifact/out/blanks-tabs.txt
 [2.0M]

   checkstyle:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/1122/artifact/out/results-checkstyle-root.txt
 [13M]

   hadolint:

  
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java8-linux-x86_64/11

[jira] [Created] (HDFS-16901) RBF: Routers should propagate the real user in the UGI via the caller context

2023-01-31 Thread Simbarashe Dzinamarira (Jira)
Simbarashe Dzinamarira created HDFS-16901:
-

 Summary: RBF: Routers should propagate the real user in the UGI 
via the caller context
 Key: HDFS-16901
 URL: https://issues.apache.org/jira/browse/HDFS-16901
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Simbarashe Dzinamarira


If the router receives an operation from a proxyUser, it drops the realUser in 
the UGI and makes the routerUser the realUser for the operation that goes to 
the namenode.

In the namenode UGI logs, we'd like the ability to know the original realUser.

The router should propagate the realUser from the client call as part of the 
callerContext.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch-2.10+JDK7 on Linux/x86_64

2023-01-31 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-branch-2.10-java7-linux-x86_64/924/

No changes




-1 overall


The following subsystems voted -1:
asflicense hadolint mvnsite pathlen unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.fs.TestFileUtil 
   hadoop.hdfs.shortcircuit.TestShortCircuitCache 
   hadoop.hdfs.TestQuota 
   hadoop.hdfs.TestLargeBlock 
   hadoop.security.TestRefreshUserMappings 
   hadoop.hdfs.server.namenode.snapshot.TestFileContextSnapshot 
   hadoop.hdfs.server.namenode.TestNameNodeStatusMXBean 
   hadoop.hdfs.server.namenode.web.resources.TestWebHdfsDataLocality 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.namenode.TestBlockPlacementPolicyRackFaultTolerant 
   hadoop.hdfs.server.namenode.TestNameNodeRetryCacheMetrics 
   hadoop.hdfs.TestWriteConfigurationToDFS 
   hadoop.hdfs.TestDFSShell 
   hadoop.hdfs.TestDisableConnCache 
   hadoop.metrics2.sink.TestRollingFileSystemSinkWithSecureHdfs 
   hadoop.hdfs.server.namenode.TestDiskspaceQuotaUpdate 
   hadoop.hdfs.server.namenode.TestEditLog 
   hadoop.hdfs.server.namenode.ha.TestDelegationTokensWithHA 
   hadoop.hdfs.server.namenode.ha.TestStandbyInProgressTail 
   
hadoop.hdfs.server.namenode.snapshot.TestSnapshotNameWithInvalidCharacters 
   hadoop.hdfs.server.namenode.TestListOpenFiles 
   hadoop.hdfs.server.namenode.TestNamenodeCapacityReport 
   hadoop.hdfs.server.namenode.ha.TestSeveralNameNodes 
   
hadoop.hdfs.server.namenode.snapshot.TestINodeFileUnderConstructionWithSnapshot 
   hadoop.hdfs.TestDecommission 
   hadoop.hdfs.server.mover.TestMover 
   hadoop.hdfs.server.namenode.TestMetadataVersionOutput 
   hadoop.hdfs.server.namenode.ha.TestHAAppend 
   hadoop.hdfs.server.namenode.ha.TestDNFencing 
   hadoop.hdfs.server.namenode.TestStorageRestore 
   hadoop.hdfs.TestDatanodeStartupFixesLegacyStorageIDs 
   hadoop.hdfs.server.namenode.TestUpgradeDomainBlockPlacementPolicy 
   hadoop.hdfs.server.namenode.ha.TestStandbyIsHot 
   hadoop.hdfs.TestRollingUpgradeRollback 
   hadoop.hdfs.server.namenode.ha.TestStandbyBlockManagement 
   hadoop.hdfs.server.namenode.TestGetContentSummaryWithPermission 
   hadoop.hdfs.TestByteBufferPread 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshot 
   hadoop.hdfs.server.namenode.ha.TestDNFencingWithReplication 
   hadoop.tracing.TestTraceAdmin 
   hadoop.hdfs.server.namenode.TestAuditLoggerWithCommands 
   hadoop.hdfs.server.namenode.ha.TestEditLogsDuringFailover 
   hadoop.tracing.TestTracing 
   hadoop.hdfs.TestDataStream 
   hadoop.hdfs.TestDeadNodeDetection 
   hadoop.hdfs.server.namenode.TestMetaSave 
   hadoop.hdfs.server.datanode.TestBatchIbr 
   hadoop.hdfs.server.datanode.TestBlockCountersInPendingIBR 
   hadoop.hdfs.server.namenode.TestLargeDirectoryDelete 
   hadoop.hdfs.server.namenode.TestMalformedURLs 
   hadoop.hdfs.TestTrashWithEncryptionZones 
   hadoop.hdfs.server.namenode.ha.TestUpdateBlockTailing 
   hadoop.hdfs.server.namenode.ha.TestObserverNode 
   hadoop.hdfs.server.namenode.ha.TestNNHealthCheck 
   hadoop.TestRefreshCallQueue 
   hadoop.hdfs.server.namenode.TestFileContextAcl 
   hadoop.hdfs.server.namenode.ha.TestBootstrapStandby 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotStatsMXBean 
   hadoop.hdfs.server.namenode.ha.TestHAMetrics 
   hadoop.hdfs.TestFileAppendRestart 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotListing 
   hadoop.hdfs.TestParallelShortCircuitReadNoChecksum 
   hadoop.hdfs.shortcircuit.TestShortCircuitLocalRead 
   hadoop.hdfs.server.namenode.ha.TestMultiObserverNode 
   hadoop.hdfs.server.namenode.ha.TestPendingCorruptDnMessages 
   hadoop.hdfs.TestListFilesInDFS 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotDeletion 
   hadoop.hdfs.server.namenode.TestDecommissioningStatus 
   hadoop.cli.TestCacheAdminCLI 
   hadoop.hdfs.TestEncryptedTransfer 
   hadoop.hdfs.TestHAAuxiliaryPort 
   hadoop.cli.TestAclCLI 
   hadoop.hdfs.server.namenode.TestEditLogRace 
   hadoop.hdfs.server.namenode.TestSecondaryNameNodeUpgrade 
   hadoop.hdfs.server.namenode.TestHDFSConcat 
   hadoop.TestGenericRefresh 
   hadoop.hdfs.server.namenode.TestAuditLogs 
   hadoop.hdfs.server.namenode.snapshot.TestSnapshotFileLength 
   
hadoop.hdfs.server.blockmanagement.TestReplicationPolicyWithUpgradeDomain 
   hadoop.hdfs.server.namenode.snapshot.TestRenameWithSnapshots 
   hadoop.hdfs.server.namenode.TestParallelImageWrite 
  

Apache Hadoop qbt Report: trunk+JDK11 on Linux/x86_64

2023-01-31 Thread Apache Jenkins Server
For more details, see 
https://ci-hadoop.apache.org/job/hadoop-qbt-trunk-java11-linux-x86_64/436/

[Jan 30, 2023, 5:17:04 PM] (github) HADOOP-18584. [NFS GW] Fix regression after 
netty4 migration. (#5252)




-1 overall


The following subsystems voted -1:
blanks hadolint pathlen spotbugs unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/resources/xml/external-dtd.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/fair-scheduler-invalid.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/src/test/resources/yarn-site-with-invalid-allocation-file-ref.xml
 

spotbugs :

   module:hadoop-hdfs-project/hadoop-hdfs 
   Redundant nullcheck of oldLock, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory))
 Redundant null check at DataStorage.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.DataStorage.isPreUpgradableLayout(Storage$StorageDirectory))
 Redundant null check at DataStorage.java:[line 695] 
   Redundant nullcheck of metaChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long,
 FileInputStream, FileChannel, String) Redundant null check at 
MappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MappableBlockLoader.verifyChecksum(long,
 FileInputStream, FileChannel, String) Redundant null check at 
MappableBlockLoader.java:[line 138] 
   Redundant nullcheck of blockChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at MemoryMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.MemoryMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at MemoryMappableBlockLoader.java:[line 75] 
   Redundant nullcheck of blockChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at NativePmemMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.load(long,
 FileInputStream, FileInputStream, String, ExtendedBlockId) Redundant null 
check at NativePmemMappableBlockLoader.java:[line 85] 
   Redundant nullcheck of metaChannel, which is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,,
 long, FileInputStream, FileChannel, String) Redundant null check at 
NativePmemMappableBlockLoader.java:is known to be non-null in 
org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.NativePmemMappableBlockLoader.verifyChecksumAndMapBlock(NativeIO$POSIX$$PmemMappedRegion,,
 long, FileInputStream, FileChannel, String) Redundant null check at 
NativePmemMappableBlockLoader.java:[line 130] 
   
org.apache.hadoop.hdfs.server.namenode.top.window.RollingWindowManager$UserCounts
  doesn't override java.util.ArrayList.equals(Object) At 
RollingWindowManager.java:At RollingWindowManager.java:[line 1] 

spotbugs :

   module:hadoop-yarn-project/hadoop-yarn 
   Redundant nullcheck of it, which is known to be non-null in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.recoverTrackerResources(LocalResourcesTracker,
 NMStateStoreService$LocalResourceTrackerState)) Redundant null check at 
ResourceLocalizationService.java:is known to be non-null in 
org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.recoverTrackerResources(LocalResourcesTracker,
 NMStateStoreService$LocalResourceTrack

[jira] [Resolved] (HDFS-16821) Fix regression in HDFS-13522 that enables observer reads by default.

2023-01-31 Thread Takanobu Asanuma (Jira)


 [ 
https://issues.apache.org/jira/browse/HDFS-16821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takanobu Asanuma resolved HDFS-16821.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> Fix regression in HDFS-13522 that enables observer reads by default.
> 
>
> Key: HDFS-16821
> URL: https://issues.apache.org/jira/browse/HDFS-16821
> Project: Hadoop HDFS
>  Issue Type: Bug
>  Components: hdfs
>Reporter: Simbarashe Dzinamarira
>Assignee: Simbarashe Dzinamarira
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Serving reads consistently from Observer Namenodes is a feature that was 
> introduced in HDFS-12943.
> Clients opt-into this feature by configuring the ObserverReadProxyProvider. 
> It is important that the opt-in is explicit because for third-party reads to 
> remain consistent, these clients then need to perform an msync before reads.
> In HDFS-13522, the ClientGSIContext is implicitly added to the DFSClient thus 
> enabling Observer reads for all clients by default. This breaks consistency 
> guarantees for clients that haven't opted into observer reads.
> [https://github.com/apache/hadoop/pull/4883/files#diff-a627e2c1f3e68235520d3c28092f4ae8a41aa4557cc530e4e6862c318be7e898R352-R354]
> We need to return to the old behavior of only using the ClientGSIContext when 
> users have explicitly opted into Observer reads.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-16902) Add Namenode status to BPServiceActor metrics and improve logging in offerservice

2023-01-31 Thread Viraj Jasani (Jira)
Viraj Jasani created HDFS-16902:
---

 Summary: Add Namenode status to BPServiceActor metrics and improve 
logging in offerservice
 Key: HDFS-16902
 URL: https://issues.apache.org/jira/browse/HDFS-16902
 Project: Hadoop HDFS
  Issue Type: Task
Reporter: Viraj Jasani
Assignee: Viraj Jasani


Recently came across an k8s environment where randomly some datanode pods are 
not able to stay connected to all namenode pods (e.g. last heartbeat time stays 
higher than 2 hr sometimes). When new namenode becomes active, any datanode 
that is not heartbeating to it would not be able to send any further block 
reports, leading to missing replicas sometimes, which would be resolved only 
with datanode pod restart.

While the issue seems env specific, BPServiceActor's offer service could use 
some logging improvements. It is also good to get namenode status exposed with 
BPServiceActorInfo to identify any lags from datanode side in recognizing 
updated Active namenode status with heartbeats.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org