[jira] [Created] (HDFS-14701) Change Log Level to warn in SlotReleaser

2019-08-05 Thread Lisheng Sun (JIRA)
Lisheng Sun created HDFS-14701:
--

 Summary: Change Log Level to warn in SlotReleaser
 Key: HDFS-14701
 URL: https://issues.apache.org/jira/browse/HDFS-14701
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Lisheng Sun


{code:java}
// @Override
public void run() {
  LOG.trace("{}: about to release {}", ShortCircuitCache.this, slot);
  final DfsClientShm shm = (DfsClientShm)slot.getShm();
  final DomainSocket shmSock = shm.getPeer().getDomainSocket();
  final String path = shmSock.getPath();
  boolean success = false;
  try (DomainSocket sock = DomainSocket.connect(path);
   DataOutputStream out = new DataOutputStream(
   new BufferedOutputStream(sock.getOutputStream( {
new Sender(out).releaseShortCircuitFds(slot.getSlotId());
DataInputStream in = new DataInputStream(sock.getInputStream());
ReleaseShortCircuitAccessResponseProto resp =
ReleaseShortCircuitAccessResponseProto.parseFrom(
PBHelperClient.vintPrefixed(in));
if (resp.getStatus() != Status.SUCCESS) {
  String error = resp.hasError() ? resp.getError() : "(unknown)";
  throw new IOException(resp.getStatus().toString() + ": " + error);
}
LOG.trace("{}: released {}", this, slot);
success = true;
  } catch (IOException e) {
LOG.error(ShortCircuitCache.this + ": failed to release " +
"short-circuit shared memory slot " + slot + " by sending " +
"ReleaseShortCircuitAccessRequestProto to " + path +
".  Closing shared memory segment.", e);
  } finally {
if (success) {
  shmManager.freeSlot(slot);
} else {
  shm.getEndpointShmManager().shutdown(shm);
}
  }
}
{code}
2019-08-05,15:28:03,838 ERROR [ShortCircuitCache_SlotReleaser] 
org.apache.hadoop.hdfs.shortcircuit.ShortCircuitCache: 
ShortCircuitCache(0x65849546): failed to release short-circuit shared memory 
slot Slot(slotIdx=62, shm=DfsClientShm(70593ef8b3d84cba3c2f0a1e81377eb1)) by 
sending ReleaseShortCircuitAccessRequestProto to 
/home/work/app/hdfs/c3micloudsrv-hdd/datanode/dn_socket.  Closing shared memory 
segment.

java.io.IOException: ERROR_INVALID: there is no shared memory segment 
registered with shmId 70593ef8b3d84cba3c2f0a1e81377eb1



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2019-08-05 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/

No changes




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/conf/empty-configuration.xml
 
   hadoop-tools/hadoop-azure/src/config/checkstyle-suppressions.xml 
   hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/public/crossdomain.xml 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-ui/src/main/webapp/public/crossdomain.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-timelineservice-hbase/hadoop-yarn-server-timelineservice-hbase-client
 
   Boxed value is unboxed and then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:then immediately reboxed in 
org.apache.hadoop.yarn.server.timelineservice.storage.common.ColumnRWHelper.readResultsWithTimestamps(Result,
 byte[], byte[], KeyConverter, ValueConverter, boolean) At 
ColumnRWHelper.java:[line 335] 

Failed junit tests :

   hadoop.hdfs.server.namenode.ha.TestBootstrapStandby 
   hadoop.hdfs.server.datanode.TestDataNodeHotSwapVolumes 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.server.namenode.TestNameNodeMetadataConsistency 
   hadoop.hdfs.server.datanode.TestDirectoryScanner 
   hadoop.hdfs.server.datanode.TestDataNodeUUID 
   hadoop.hdfs.qjournal.server.TestJournalNodeRespectsBindHostKeys 
   hadoop.hdfs.server.namenode.TestNameNodeHttpServerXFrame 
   hadoop.yarn.client.api.impl.TestAMRMProxy 
   hadoop.yarn.client.api.impl.TestNMClient 
   hadoop.registry.secure.TestSecureLogins 
   hadoop.yarn.server.resourcemanager.TestRMEmbeddedElector 
   hadoop.yarn.server.timelineservice.security.TestTimelineAuthFilterForV2 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-cc-root-jdk1.7.0_95.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-javac-root-jdk1.7.0_95.txt
  [328K]

   cc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-cc-root-jdk1.8.0_222.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-compile-javac-root-jdk1.8.0_222.txt
  [308K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-checkstyle-root.txt
  [16M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-pylint.txt
  [24K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-shellcheck.txt
  [72K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-patch-shelldocs.txt
  [8.0K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/whitespace-eol.txt
  [12M]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/whitespace-tabs.txt
  [1.2M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/xml.txt
  [12K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-timelineservice-hbase_hadoop-yarn-server-timelineservice-hbase-client-warnings.html
  [8.0K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-javadoc-javadoc-root-jdk1.7.0_95.txt
  [16K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/diff-javadoc-javadoc-root-jdk1.8.0_222.txt
  [1.1M]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [324K]
   
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/404/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-

[jira] [Created] (HDFS-14702) Datanode.ReplicaMap memory leak

2019-08-05 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created HDFS-14702:
--

 Summary: Datanode.ReplicaMap memory leak
 Key: HDFS-14702
 URL: https://issues.apache.org/jira/browse/HDFS-14702
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: datanode
Affects Versions: 2.7.1
Reporter: He Xiaoqiao


DataNode memory is occupied by ReplicaMaps and cause GC high frequency then 
write performance degrade.
It is about 600K block replicas located at DataNode, but when dump heap, there 
are over 8M items of ReplicaMaps and footprint over 500MB. It seems that memory 
leak. One more situation, the block w/r ops is very high.
Do not test HDFS-8859 and no idea if it can solve this issue.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1910) Cannot build hadoop-hdds-config from scratch in IDEA

2019-08-05 Thread Doroszlai, Attila (JIRA)
Doroszlai, Attila created HDDS-1910:
---

 Summary: Cannot build hadoop-hdds-config from scratch in IDEA
 Key: HDDS-1910
 URL: https://issues.apache.org/jira/browse/HDDS-1910
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
  Components: build
Reporter: Doroszlai, Attila
Assignee: Doroszlai, Attila


Building {{hadoop-hdds-config}} from scratch (eg. right after checkout or after 
{{mvn clean}}) in IDEA fails with the following error:

{code}
Error:java: Bad service configuration file, or exception thrown while 
constructing Processor object: javax.annotation.processing.Processor: Provider 
org.apache.hadoop.hdds.conf.ConfigFileGenerator not found
{code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14674) [SBN read] Got an unexpected txid when tail editlog

2019-08-05 Thread Wei-Chiu Chuang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei-Chiu Chuang reopened HDFS-14674:


[~wangzhaohui] this patch is not yet committed. Reopen this jira for now.

> [SBN read] Got an unexpected txid when tail editlog
> ---
>
> Key: HDFS-14674
> URL: https://issues.apache.org/jira/browse/HDFS-14674
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>Reporter: wangzhaohui
>Assignee: wangzhaohui
>Priority: Blocker
> Attachments: HDFS-14674-001.patch, HDFS-14674-003.patch, 
> HDFS-14674-004.patch, image-2019-07-26-11-34-23-405.png
>
>
> Add the following configuration
> !image-2019-07-26-11-34-23-405.png!
> error:
> {code:java}
> //
> [2019-07-17T11:50:21.048+08:00] [INFO] [Edit log tailer] : replaying edit 
> log: 1/20512836 transactions completed. (0%) [2019-07-17T11:50:21.059+08:00] 
> [INFO] [Edit log tailer] : Edits file 
> http://ip/getJournal?jid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232056426162&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  of size 3126782311 edits # 500 loaded in 3 seconds 
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Reading 
> org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@51ceb7bc 
> expecting start txid #232056752162 [2019-07-17T11:50:21.059+08:00] [INFO] 
> [Edit log tailer] : Start loading edits file 
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH
>  maxTxnipsToRead = 500 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log 
> tailer] : Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH,
>  
> http://ip/getJournal?ipjid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.059+08:00] [INFO] [Edit 
> log tailer] ip: Fast-forwarding stream 
> 'http://ip/getJournal?jid=ns1003&segmentTxId=232077264498&storageInfo=-63%3A1902204348%3A0%3ACID-hope-20180214-20161018-SQYH'
>  to transaction ID 232056751662 [2019-07-17T11:50:21.061+08:00] [ERROR] [Edit 
> log tailer] : Unknown error encountered while tailing edits. Shutting down 
> standby NN. java.io.IOException: There appears to be a gap in the edit log. 
> We expected txid 232056752162, but got txid 232077264498. at 
> org.apache.hadoop.hdfs.server.namenode.MetaRecoveryContext.editLogLoaderPrompt(MetaRecoveryContext.java:94)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadEditRecords(FSEditLogLoader.java:239)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSEditLogLoader.loadFSEdits(FSEditLogLoader.java:161)
>  at 
> org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:895) at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:321)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:460)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$400(EditLogTailer.java:410)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:427)
>  at 
> org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:414)
>  at 
> org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:423)
>  [2019-07-17T11:50:21.064+08:00] [INFO] [Edit log tailer] : Exiting with 
> status 1 [2019-07-17T11:50:21.066+08:00] [INFO] [Thread-1] : SHUTDOWN_MSG: 
> / SHUTDOWN_MSG: 
> Shutting down NameNode at ip 
> /
> {code}
>  
> if dfs.ha.tail-edits.max-txns-per-lock value is 500,when the namenode load 
> the editlog util 500,the current namenode will load the next editlog,but 
> editlog more than 500.So,namenode got an unexpected txid when tail editlog.
>  
>  
> {code:java}
> //
> [2019-07-17T11:50:21.059+08:00] [INFO] [Edit log tailer] : Edits file 
> htt

Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2019-08-05 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/

[Aug 4, 2019 5:26:25 AM] (bharat) HDDS-1870. ConcurrentModification at 
PrometheusMetricsSink (#1179)
[Aug 4, 2019 5:33:01 AM] (bharat) HDDS-1896. Suppress WARN log from 
NetworkTopology#getDistanceCost.




-1 overall


The following subsystems voted -1:
asflicense findbugs hadolint pathlen unit xml


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

XML :

   Parsing Error(s): 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-excerpt.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-output-missing-tags2.xml
 
   
hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-nodemanager/src/test/resources/nvidia-smi-sample-output.xml
 

FindBugs :

   
module:hadoop-yarn-project/hadoop-yarn/hadoop-yarn-applications/hadoop-yarn-applications-mawo/hadoop-yarn-applications-mawo-core
 
   Class org.apache.hadoop.applications.mawo.server.common.TaskStatus 
implements Cloneable but does not define or use clone method At 
TaskStatus.java:does not define or use clone method At TaskStatus.java:[lines 
39-346] 
   Equals method for 
org.apache.hadoop.applications.mawo.server.worker.WorkerId assumes the argument 
is of type WorkerId At WorkerId.java:the argument is of type WorkerId At 
WorkerId.java:[line 114] 
   
org.apache.hadoop.applications.mawo.server.worker.WorkerId.equals(Object) does 
not check for null argument At WorkerId.java:null argument At 
WorkerId.java:[lines 114-115] 

Failed junit tests :

   hadoop.hdfs.TestDFSInotifyEventInputStreamKerberized 
   hadoop.hdfs.TestReconstructStripedFile 
   hadoop.hdfs.server.datanode.TestLargeBlockReport 
   hadoop.hdfs.server.datanode.fsdataset.impl.TestSpaceReservation 
   hadoop.fs.http.server.TestHttpFSServer 
   hadoop.hdfs.server.federation.router.TestRouterWithSecureStartup 
   hadoop.hdfs.server.federation.security.TestRouterHttpDelegationToken 
   hadoop.fs.azurebfs.services.TestAbfsClientThrottlingAnalyzer 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-compile-javac-root.txt
  [332K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-checkstyle-root.txt
  [17M]

   hadolint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-hadolint.txt
  [4.0K]

   pathlen:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/pathlen.txt
  [12K]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-pylint.txt
  [220K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/diff-patch-shelldocs.txt
  [44K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/whitespace-eol.txt
  [9.6M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/whitespace-tabs.txt
  [1.1M]

   xml:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/xml.txt
  [16K]

   findbugs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-applications_hadoop-yarn-applications-mawo_hadoop-yarn-applications-mawo-core-warnings.html
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-hdds_container-service.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-hdds_server-scm.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-ozone_client.txt
  [8.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-ozone_objectstore-service.txt
  [4.0K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/1219/artifact/out/branch-findbugs-hadoop-ozone_ozo

Topics for Hadoop storage online sync

2019-08-05 Thread Wei-Chiu Chuang
Hello!

For this week's community online sync (English, Wednesday 9am US Pacific
Time), we will have CR Hota from Uber to talk about the latest update in
Router Based Federation.

He will touch upon the following topics:
1. Security (Development and zookeeper scale testing learnings)
2. Isolation for multiple clusters
3. Routers for Observer namenodes (Our internal design). Open source
implementation is yet to be done.
3. DNS Support

In case you missed the past community online sync, here's the information
to access (Zoom) and meeting notes:
https://docs.google.com/document/d/1jXM5Ujvf-zhcyw_5kiQVx6g-HeKe-YGnFS_1-qFXomI/edit

I am also looking for topics and maybe demos for the upcoming Mandarin
community sync call this week and in the future. So definitely reach out to
me so we can announce it in advance.

Thanks all!
Weichiu


HDFS At Rest Encryption (Encryption Zone, KMS) Improvements

2019-08-05 Thread Wei-Chiu Chuang
There are a bunch of stuff that can use some attention to improve HDFS At
Rest Encryption (aka Data Transparent Encryption).

Here's a spreadsheet of features, bug fixes, improvements
https://docs.google.com/spreadsheets/d/13oeUy0Mvmq6Ngvw9IjkY9KXiQyoRIcRlijKXG1YqGug/edit?usp=sharing

I am planning to spend the next few months to focus on KMS. If you are
looking for some thing to contribute, or if you care about data encryption,
or you have a EZ/KMS patch pending review, now's a good time.

Best,
Weichiu


[jira] [Created] (HDDS-1911) Support Prefix ACL operations for OM HA.

2019-08-05 Thread Bharat Viswanadham (JIRA)
Bharat Viswanadham created HDDS-1911:


 Summary: Support Prefix ACL operations for OM HA.
 Key: HDDS-1911
 URL: https://issues.apache.org/jira/browse/HDDS-1911
 Project: Hadoop Distributed Data Store
  Issue Type: Sub-task
Reporter: Bharat Viswanadham
Assignee: Bharat Viswanadham


+HDDS-1541+ adds 4 new api for Ozone rpc client. OM HA implementation needs to 
handle them.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1912) start-ozone.sh fail due to ozone-config.sh not found

2019-08-05 Thread kevin su (JIRA)
kevin su created HDDS-1912:
--

 Summary: start-ozone.sh fail due to ozone-config.sh not found 
 Key: HDDS-1912
 URL: https://issues.apache.org/jira/browse/HDDS-1912
 Project: Hadoop Distributed Data Store
  Issue Type: Improvement
  Components: Ozone CLI
Affects Versions: 0.5.0
Reporter: kevin su
 Fix For: 0.5.0


I want to run Ozone individually,but it will always find start-ozone.sh in the 
*$HAOOP_HOME/*libexec firstly

If file not found, it will fail

We should find this file in the both *$HADOOP_HOME* and *$OZONE_HOME*/libexec



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14703) NameNode Fine-Grained Locking via Metadata Partitioning

2019-08-05 Thread Konstantin Shvachko (JIRA)
Konstantin Shvachko created HDFS-14703:
--

 Summary: NameNode Fine-Grained Locking via Metadata Partitioning
 Key: HDFS-14703
 URL: https://issues.apache.org/jira/browse/HDFS-14703
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: hdfs, namenode
Reporter: Konstantin Shvachko


We target to enable fine-grained locking by splitting the in-memory namespace 
into multiple partitions each having a separate lock. Intended to improve 
performance of NameNode write operations.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDDS-1913) Fix OzoneBucket and RpcClient APIS for acl

2019-08-05 Thread Bharat Viswanadham (JIRA)
Bharat Viswanadham created HDDS-1913:


 Summary: Fix OzoneBucket and RpcClient APIS for acl
 Key: HDDS-1913
 URL: https://issues.apache.org/jira/browse/HDDS-1913
 Project: Hadoop Distributed Data Store
  Issue Type: Bug
Reporter: Bharat Viswanadham
Assignee: Bharat Viswanadham


Fix addAcl,removeAcl in OzoneBucket to use newly added acl API's 
addAcl/removeAcl as part of HDDS-1739.

Remove addBucketAcls, removeBucketAcls from RpcClient. We should use 
addAcl/removeAcl.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-14704) RBF:NnId should not be null in NamenodeHeartbeatService

2019-08-05 Thread xuzq (JIRA)
xuzq created HDFS-14704:
---

 Summary: RBF:NnId should not be null in NamenodeHeartbeatService
 Key: HDFS-14704
 URL: https://issues.apache.org/jira/browse/HDFS-14704
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: rbf
Reporter: xuzq


NnId should not be null in NamenodeHeartbeatService.

If NnId is null, it will also print the error message like:
{code:java}
2019-08-06 10:38:07,455 ERROR router.NamenodeHeartbeatService 
(NamenodeHeartbeatService.java:updateState(229)) - Unhandled exception updating 
NN registration for ns1:null
java.lang.NullPointerException
at 
org.apache.hadoop.hdfs.federation.protocol.proto.HdfsServerFederationProtos$NamenodeMembershipRecordProto$Builder.setServiceAddress(HdfsServerFederationProtos.java:3831)
at 
org.apache.hadoop.hdfs.server.federation.store.records.impl.pb.MembershipStatePBImpl.setServiceAddress(MembershipStatePBImpl.java:119)
at 
org.apache.hadoop.hdfs.server.federation.store.records.MembershipState.newInstance(MembershipState.java:108)
at 
org.apache.hadoop.hdfs.server.federation.resolver.MembershipNamenodeResolver.registerNamenode(MembershipNamenodeResolver.java:267)
at 
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.updateState(NamenodeHeartbeatService.java:223)
at 
org.apache.hadoop.hdfs.server.federation.router.NamenodeHeartbeatService.periodicInvoke(NamenodeHeartbeatService.java:159)
at 
org.apache.hadoop.hdfs.server.federation.router.PeriodicService$1.run(PeriodicService.java:178)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748){code}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-14652) HealthMonitor connection retry times should be configurable

2019-08-05 Thread Chen Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/HDFS-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chen Zhang reopened HDFS-14652:
---

missed properties in core-default.xml

> HealthMonitor connection retry times should be configurable
> ---
>
> Key: HDFS-14652
> URL: https://issues.apache.org/jira/browse/HDFS-14652
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Chen Zhang
>Assignee: Chen Zhang
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: HDFS-14652-001.patch, HDFS-14652-002.patch
>
>
> On our production HDFS cluster, some client's burst requests cause the tcp 
> kernel queue full on NameNode's host,  since the configuration value of 
> "net.ipv4.tcp_syn_retries" in our environment is 1, so after 3 seconds, the 
> ZooKeeper Healthmonitor got an connection error like this:
> {code:java}
> WARN org.apache.hadoop.ha.HealthMonitor: Transport-level exception trying to 
> monitor health of NameNode at nn_host_name/ip_address:port: Call From 
> zkfc_host_name/ip to nn_host_name:port failed on connection exception: 
> java.net.ConnectException: Connection timed out; For more details see: 
> http://wiki.apache.org/hadoop/ConnectionRefused
> {code}
> This error caused a failover and affects the availability of that cluster, we 
> fixed this issue by enlarge the kernel parameter net.ipv4.tcp_syn_retries to 6
> But during working on this issue, we found that the connection retry 
> time(ipc.client.connect.max.retries) of health-monitor is hard coded as 1, I 
> think it should be configurable, then if we don't want the health-monitor so 
> sensitive, we can change it's behavior by change this configuration



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org