[jira] [Created] (HDFS-14701) Change Log Level to warn in SlotReleaser
Lisheng Sun created HDFS-14701: -- Summary: Change Log Level to warn in SlotReleaser Key: HDFS-14701 URL: https://issues.apache.org/jira/browse/HDFS-14701 Project: Hadoop HDFS Issue Type: Improvement Reporter: Lisheng Sun {code:java} // @Override public void run() { LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot); final DfsClientShm shm = (DfsClientShm)slot.getShm(); final DomainSocket shmSock = shm.getPeer().getDomainSocket(); final String path = shmSock.getPath(); boolean success = false; try (DomainSocket sock = DomainSocket.connect(path); DataOutputStream out = new DataOutputStream( new BufferedOutputStream(sock.getOutputStream( { new Sender(out).releaseShortCircuitFds(slot.getSlotId()); DataInputStream in = new DataInputStream(sock.getInputStream()); ReleaseShortCircuitAccessResponseProto resp = ReleaseShortCircuitAccessResponseProto.parseFrom( PBHelperClient.vintPrefixed(in)); if (resp.getStatus() != Status.SUCCESS) { String error = resp.hasError() ? resp.getError() : "(unknown)"; throw new IOException(resp.getStatus().toString() + ": " + error); } LOG.trace("{}: released {}", this, slot); success = true; } catch (IOException e) { LOG.error(ShortCircuitCache.this + ": failed to release " + "short-circuit shared memory slot " + slot + " by sending " + "ReleaseShortCircuitAccessRequestProto to " + path + ". Closing shared memory segment.", e); } finally { if (success) { shmManager.freeSlot(slot); } else { shm.getEndpointShmManager().shutdown(shm); } } } {code} 2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: ShortCircuitCache(0x65849546): failed to release short-circuit shared memory slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by sending ReleaseShortCircuitAccessRequestProto to /home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket. Closing shared memory segment. java.io.IOException: ERROR_INVALID: there is no shared memory segment registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1 -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/ No changes -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client Boxed value is unboxed and then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:then immediately reboxed in org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result, byte[], byte[], KeyConverter, ValueConverter, boolean) At ColumnRWHelper.java:[line 335] Failed junit tests : hadoop.hdfs.server.namenode.ha.TestBootstrapStandby hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency hadoop.hdfs.server.datanode.TestDirectoryScanner hadoop.hdfs.server.datanode.TestDataNodeUUID hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys hadoop.hdfs.server.namenode.TestNameNodeHttpServerXFrame hadoop.yarn.client.api.impl.TestAMRMProxy hadoop.yarn.client.api.impl.TestNMClient hadoop.registry.secure.TestSecureLogins hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt [328K] cc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-cc-root-jdk1.8.0_222.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-javac-root-jdk1.8.0_222.txt [308K] checkstyle: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-checkstyle-root.txt [16M] hadolint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-pylint.txt [24K] shellcheck: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-shellcheck.txt [72K] shelldocs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-shelldocs.txt [8.0K] whitespace: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/whitespace-eol.txt [12M] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/whitespace-tabs.txt [1.2M] xml: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/xml.txt [12K] findbugs: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html [8.0K] javadoc: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt [16K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_222.txt [1.1M] unit: https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt [324K] https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-
[jira] [Created] (HDFS-14702) Datanode.ReplicaMap memory leak
He Xiaoqiao created HDFS-14702: -- Summary: Datanode.ReplicaMap memory leak Key: HDFS-14702 URL: https://issues.apache.org/jira/browse/HDFS-14702 Project: Hadoop HDFS Issue Type: Bug Components: datanode Affects Versions: 2.7.1 Reporter: He Xiaoqiao DataNode memory is occupied by ReplicaMaps and cause GC high frequency then write performance degrade. It is about 600K block replicas located at DataNode, but when dump heap, there are over 8M items of ReplicaMaps and footprint over 500MB. It seems that memory leak. One more situation, the block w/r ops is very high. Do not test HDFS-8859 and no idea if it can solve this issue. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1910) Cannot build hadoop-hdds-config from scratch in IDEA
Doroszlai, Attila created HDDS-1910: --- Summary: Cannot build hadoop-hdds-config from scratch in IDEA Key: HDDS-1910 URL: https://issues.apache.org/jira/browse/HDDS-1910 Project: Hadoop Distributed Data Store Issue Type: Bug Components: build Reporter: Doroszlai, Attila Assignee: Doroszlai, Attila Building {{hadoop-hdds-config}} from scratch (eg. right after checkout or after {{mvn clean}}) in IDEA fails with the following error: {code} Error:java: Bad service configuration file, or exception thrown while constructing Processor object: javax.annotation.processing.Processor: Provider org.apache.hadoop.hdds.conf.ConfigFileGenerator not found {code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14674) [SBN read] Got an unexpected txid when tail editlog
[ https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei-Chiu Chuang reopened HDFS-14674: [~wangzhaohui] this patch is not yet committed. Reopen this jira for now. > [SBN read] Got an unexpected txid when tail editlog > --- > > Key: HDFS-14674 > URL: https://issues.apache.org/jira/browse/HDFS-14674 > Project: Hadoop HDFS > Issue Type: Sub-task >Reporter: wangzhaohui >Assignee: wangzhaohui >Priority: Blocker > Attachments: HDFS-14674-001.patch, HDFS-14674-003.patch, > HDFS-14674-004.patch, image-2019-07-26-11-34-23-405.png > > > Add the following configuration > !image-2019-07-26-11-34-23-405.png! > error: > {code:java} > // > [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit > log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] > [INFO] [Edit log tailer] : Edits file > http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH, > > http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH, > > http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH > of size 3126782311 edits # 500 loaded in 3 seconds > [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading > org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc > expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] > [Edit log tailer] : Start loading edits file > http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH, > > http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH, > > http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH > maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log > tailer] : Fast-forwarding stream > 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH, > > http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH, > > http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH' > to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit > log tailer] ip: Fast-forwarding stream > 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH' > to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit > log tailer] : Unknown error encountered while tailing edits. Shutting down > standby NN. java.io.IOException: There appears to be a gap in the edit log. > We expected txid 232056752162, but got txid 232077264498. at > org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239) > at > org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161) > at > org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427) > at > org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414) > at > org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423) > [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with > status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: > / SHUTDOWN_MSG: > Shutting down NameNode at ip > / > {code} > > if dfs.ha.tail-edits.max-txns-per-lock value is 500,when the namenode load > the editlog util 500,the current namenode will load the next editlog,but > editlog more than 500.So,namenode got an unexpected txid when tail editlog. > > > {code:java} > // > [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Edits file > htt
Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86
For more details, see https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/ [Aug 4, 2019 5:26:25 AM] (bharat) HDDS-1870. ConcurrentModification at PrometheusMetricsSink (#1179) [Aug 4, 2019 5:33:01 AM] (bharat) HDDS-1896. Suppress WARN log from NetworkTopology#getDistanceCost. -1 overall The following subsystems voted -1: asflicense findbugs hadolint pathlen unit xml The following subsystems voted -1 but were configured to be filtered/ignored: cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace The following subsystems are considered long running: (runtime bigger than 1h 0m 0s) unit Specific tests: XML : Parsing Error(s): hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml FindBugs : module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core Class org.apache.hadoop.applications.mawo.server.common.TaskStatus implements Cloneable but does not define or use clone method At TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 39-346] Equals method for org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument is of type WorkerId At WorkerId.java:the argument is of type WorkerId At WorkerId.java:[line 114] org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does not check for null argument At WorkerId.java:null argument At WorkerId.java:[lines 114-115] Failed junit tests : hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized hadoop.hdfs.TestReconstructStripedFile hadoop.hdfs.server.datanode.TestLargeBlockReport hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation hadoop.fs.http.server.TestHttpFSServer hadoop.hdfs.server.federation.router.TestRouterWithSecureStartup hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer cc: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-compile-cc-root.txt [4.0K] javac: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-compile-javac-root.txt [332K] checkstyle: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-checkstyle-root.txt [17M] hadolint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-hadolint.txt [4.0K] pathlen: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/pathlen.txt [12K] pylint: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-pylint.txt [220K] shellcheck: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-shellcheck.txt [20K] shelldocs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-shelldocs.txt [44K] whitespace: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/whitespace-eol.txt [9.6M] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/whitespace-tabs.txt [1.1M] xml: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/xml.txt [16K] findbugs: https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-mawo_hadoop-yarn-applications-mawo-core-warnings.html [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-ozone_client.txt [8.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-ozone_objectstore-service.txt [4.0K] https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-ozone_ozo
Topics for Hadoop storage online sync
Hello! For this week's community online sync (English, Wednesday 9am US Pacific Time), we will have CR Hota from Uber to talk about the latest update in Router Based Federation. He will touch upon the following topics: 1. Security (Development and zookeeper scale testing learnings) 2. Isolation for multiple clusters 3. Routers for Observer namenodes (Our internal design). Open source implementation is yet to be done. 3. DNS Support In case you missed the past community online sync, here's the information to access (Zoom) and meeting notes: https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit I am also looking for topics and maybe demos for the upcoming Mandarin community sync call this week and in the future. So definitely reach out to me so we can announce it in advance. Thanks all! Weichiu
HDFS At Rest Encryption (Encryption Zone, KMS) Improvements
There are a bunch of stuff that can use some attention to improve HDFS At Rest Encryption (aka Data Transparent Encryption). Here's a spreadsheet of features, bug fixes, improvements https://docs.google.com/spreadsheets/d/13oeUy0Mvmq6Ngvw9IjkY9KXiQyoRIcRlijKXG1YqGug/edit?usp=sharing I am planning to spend the next few months to focus on KMS. If you are looking for some thing to contribute, or if you care about data encryption, or you have a EZ/KMS patch pending review, now's a good time. Best, Weichiu
[jira] [Created] (HDDS-1911) Support Prefix ACL operations for OM HA.
Bharat Viswanadham created HDDS-1911: Summary: Support Prefix ACL operations for OM HA. Key: HDDS-1911 URL: https://issues.apache.org/jira/browse/HDDS-1911 Project: Hadoop Distributed Data Store Issue Type: Sub-task Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham +HDDS-1541+ adds 4 new api for Ozone rpc client. OM HA implementation needs to handle them. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1912) start-ozone.sh fail due to ozone-config.sh not found
kevin su created HDDS-1912: -- Summary: start-ozone.sh fail due to ozone-config.sh not found Key: HDDS-1912 URL: https://issues.apache.org/jira/browse/HDDS-1912 Project: Hadoop Distributed Data Store Issue Type: Improvement Components: Ozone CLI Affects Versions: 0.5.0 Reporter: kevin su Fix For: 0.5.0 I want to run Ozone individually,but it will always find start-ozone.sh in the *$HAOOP_HOME/*libexec firstly If file not found, it will fail We should find this file in the both *$HADOOP_HOME* and *$OZONE_HOME*/libexec -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning
Konstantin Shvachko created HDFS-14703: -- Summary: NameNode Fine-Grained Locking via Metadata Partitioning Key: HDFS-14703 URL: https://issues.apache.org/jira/browse/HDFS-14703 Project: Hadoop HDFS Issue Type: Improvement Components: hdfs, namenode Reporter: Konstantin Shvachko We target to enable fine-grained locking by splitting the in-memory namespace into multiple partitions each having a separate lock. Intended to improve performance of NameNode write operations. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDDS-1913) Fix OzoneBucket and RpcClient APIS for acl
Bharat Viswanadham created HDDS-1913: Summary: Fix OzoneBucket and RpcClient APIS for acl Key: HDDS-1913 URL: https://issues.apache.org/jira/browse/HDDS-1913 Project: Hadoop Distributed Data Store Issue Type: Bug Reporter: Bharat Viswanadham Assignee: Bharat Viswanadham Fix addAcl,removeAcl in OzoneBucket to use newly added acl API's addAcl/removeAcl as part of HDDS-1739. Remove addBucketAcls, removeBucketAcls from RpcClient. We should use addAcl/removeAcl. -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Created] (HDFS-14704) RBF:NnId should not be null in NamenodeHeartbeatService
xuzq created HDFS-14704: --- Summary: RBF:NnId should not be null in NamenodeHeartbeatService Key: HDFS-14704 URL: https://issues.apache.org/jira/browse/HDFS-14704 Project: Hadoop HDFS Issue Type: Improvement Components: rbf Reporter: xuzq NnId should not be null in NamenodeHeartbeatService. If NnId is null, it will also print the error message like: {code:java} 2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService (NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception updating NN registration for ns1:null java.lang.NullPointerException at org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831) at org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119) at org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108) at org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223) at org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159) at org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748){code} -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org
[jira] [Reopened] (HDFS-14652) HealthMonitor connection retry times should be configurable
[ https://issues.apache.org/jira/browse/HDFS-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chen Zhang reopened HDFS-14652: --- missed properties in core-default.xml > HealthMonitor connection retry times should be configurable > --- > > Key: HDFS-14652 > URL: https://issues.apache.org/jira/browse/HDFS-14652 > Project: Hadoop HDFS > Issue Type: Improvement >Reporter: Chen Zhang >Assignee: Chen Zhang >Priority: Major > Fix For: 3.3.0 > > Attachments: HDFS-14652-001.patch, HDFS-14652-002.patch > > > On our production HDFS cluster, some client's burst requests cause the tcp > kernel queue full on NameNode's host, since the configuration value of > "net.ipv4.tcp_syn_retries" in our environment is 1, so after 3 seconds, the > ZooKeeper Healthmonitor got an connection error like this: > {code:java} > WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to > monitor health of NameNode at nn_host_name/ip_address:port: Call From > zkfc_host_name/ip to nn_host_name:port failed on connection exception: > java.net.ConnectException: Connection timed out; For more details see: > http://wiki.apache.org/hadoop/ConnectionRefused > {code} > This error caused a failover and affects the availability of that cluster, we > fixed this issue by enlarge the kernel parameter net.ipv4.tcp_syn_retries to 6 > But during working on this issue, we found that the connection retry > time(ipc.client.connect.max.retries) of health-monitor is hard coded as 1, I > think it should be configurable, then if we don't want the health-monitor so > sensitive, we can change it's behavior by change this configuration -- This message was sent by Atlassian JIRA (v7.6.14#76016) - To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org