[jira] [Resolved] (HDFS-12691) Ozone: Decrease interval time of SCMBlockDeletingService for improving the efficiency

2017-10-24 Thread Yiqun Lin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yiqun Lin resolved HDFS-12691.
--
Resolution: Later

> Ozone: Decrease interval time of SCMBlockDeletingService for improving the 
> efficiency
> -
>
> Key: HDFS-12691
> URL: https://issues.apache.org/jira/browse/HDFS-12691
> Project: Hadoop HDFS
>  Issue Type: Sub-task
>  Components: ozone, performance
>Affects Versions: HDFS-7240
>Reporter: Yiqun Lin
>Assignee: Yiqun Lin
>
> After logging elapsed time of each block deletion task, found some places we 
> can make some improvements after testing. The logging during testing:
> {noformat}
> 2017-10-20 17:02:55,168 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:56,169 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:56,451 [SCMBlockDeletingService#0] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : SCMBlockDeletingService
> 2017-10-20 17:02:56,755 [KeyDeletingService#0] INFO  utils.BackgroundService 
> (BackgroundService.java:run(99)) - Running background service : 
> KeyDeletingService
> 2017-10-20 17:02:56,758 [KeyDeletingService#1] INFO  ksm.KeyDeletingService 
> (KeyDeletingService.java:call(99))  - Found 11 to-delete keys in KSM
> 2017-10-20 17:02:56,775 [IPC Server handler 19 on 52342] INFO  
> scm.StorageContainerManager 
> (StorageContainerManager.java:deleteKeyBlocks(870))  - SCM is informed by 
> KSM to delete 11 blocks
> 2017-10-20 17:02:57,182 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:02:57,640 [KeyDeletingService#1] INFO  ksm.KeyDeletingService 
> (KeyDeletingService.java:call(125))  - Number of key deleted from KSM DB: 
> 11, task elapsed time: 885ms
> 2017-10-20 17:02:58,168 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:03,178 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> ...
> 2017-10-20 17:03:04,167 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:05,173 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:06,095 [BlockDeletingService#0] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : BlockDeletingService
> 2017-10-20 17:03:06,095 [BlockDeletingService#0] INFO  
> background.BlockDeletingService (BlockDeletingService.java:getTasks(109)) 
>  - Plan to choose 10 containers for block deletion, actually returns 0 valid 
> containers.
> 2017-10-20 17:03:06,171 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> ...
> 2017-10-20 17:03:54,279 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:55,267 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:56,282 [Datanode State Machine Thread - 0] INFO  
> Configuration.deprecation (Configuration.java:logDeprecation(1306)) - No unit 
> for ozone.scm.heartbeat.interval.seconds(1) assuming SECONDS
> 2017-10-20 17:03:56,461 [SCMBlockDeletingService#0] INFO  
> utils.BackgroundService (BackgroundService.java:run(99)) - Running background 
> service : SCMBlockDeletingService
> 2017-10-20 17:03:56,467 [SCMBlockDeletingService#1] INFO  
> block.SCMBlockDeletingService (SCMBlockDeletingService.java:call(129))  - 
> Totally added 11 delete blocks command for 1 datanodes, task elapsed time: 6ms
> 2017-10-20 17:03:57,265 [Datanode State Machine Thread - 0] INFO  

[jira] [Created] (HDFS-12701) More fine-grained locks in ShortCircuitCache

2017-10-24 Thread Weiwei Yang (JIRA)
Weiwei Yang created HDFS-12701:
--

 Summary: More fine-grained locks in ShortCircuitCache
 Key: HDFS-12701
 URL: https://issues.apache.org/jira/browse/HDFS-12701
 Project: Hadoop HDFS
  Issue Type: Improvement
Affects Versions: 2.8.1
Reporter: Weiwei Yang


When cluster is heavily loaded, we found HBase regionserver handlers are often 
blocked by {{ShortCircuitCache}}. Dumped jstack and found more lots of thread 
waiting on obtain the cache lock. It should be able to be improved by using 
more fine-grained locks to improve the performance.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/10/

[Oct 23, 2017 4:58:04 PM] (epayne) YARN-4163: Audit getQueueInfo and 
getApplications calls
[Oct 23, 2017 5:47:35 PM] (arp) HDFS-12683. DFSZKFailOverController re-order 
logic for logging
[Oct 23, 2017 6:17:33 PM] (naganarasimha_gr) HADOOP-14966. Handle JDK-8071638 
for hadoop-common. Contributed by Bibin
[Oct 23, 2017 8:52:15 PM] (templedf) YARN-7357. Several methods in
[Oct 24, 2017 2:50:39 AM] (yqlin) HDFS-12695. Add a link to HDFS router 
federation document in site.xml.


[Error replacing 'FILE' - Workspace is not accessible]

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer  
> wrote:
> 
> 
> 
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
> 
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight. 

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12702) Ozone: Add hugo to the dev docker image

2017-10-24 Thread Elek, Marton (JIRA)
Elek, Marton created HDFS-12702:
---

 Summary: Ozone: Add hugo to the dev docker image
 Key: HDFS-12702
 URL: https://issues.apache.org/jira/browse/HDFS-12702
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ozone
Affects Versions: HDFS-7240
Reporter: Elek, Marton


Both HADOOP-14163 and HDFS-12664 requries hugo site generation tool. To make it 
easier to review those patches I suggest to add Hugo to the dev docker image 
now.

This patch adds hugo to the dev docker image:

Test method:
{code}
cd dev-support/docker 
docker build -t test . 
docker run test hugo version
docker rmi test
{code}

Expected output (after docker run):
{code}
Hugo Static Site Generator v0.30.2 linux/amd64 BuildDate: 2017-10-19T11:34:27Z
{code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Eric Badger
+1 (non-binding)

- Verified all hashes and checksums
- Built from source on macOS 10.12.6, Java 1.8.0u65
- Deployed a pseudo cluster
- Ran some example jobs

Thanks,

Eric

On Tue, Oct 24, 2017 at 12:59 AM, Mukul Kumar Singh 
wrote:

> Thanks Junping,
>
> +1 (non-binding)
>
> I built from source on Mac OS X 10.12.6 Java 1.8.0_111
>
> - Deployed on a single node cluster.
> - Deployed a ViewFS cluster with two hdfs mount points.
> - Performed basic sanity checks.
> - Performed basic DFS operations.
>
> Thanks,
> Mukul
>
>
>
>
>
>
> On 20/10/17, 6:12 AM, "Junping Du"  wrote:
>
> >Hi folks,
> > I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.
> >
> > Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> and will be the latest stable/production release for Apache Hadoop - it
> includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> blocker/critical issues.
> >
> >  More information about the 2.8.2 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >
> >  New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.2-RC1 du/hadoop-2.8.2-RC0>
> >
> >  The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> is: 66c47f2a01ad9637879e95f80c41f798373828fb
> >
> >  The maven artifacts are available via repository.apache.org repository.apache.org/> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1064 repository.apache.org/content/repositories/orgapachehadoop-1062>
> >
> >  Please try the release and vote; the vote will run for the usual 5
> days, ending on 10/24/2017 6pm PST time.
> >
> >Thanks,
> >
> >Junping
> >
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HDFS-12703) Exceptions are fatal to decommissioning monitor

2017-10-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12703:
--

 Summary: Exceptions are fatal to decommissioning monitor
 Key: HDFS-12703
 URL: https://issues.apache.org/jira/browse/HDFS-12703
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.7.0
Reporter: Daryn Sharp
Priority: Critical


The {{DecommissionManager.Monitor}} runs as an executor scheduled task.  If an 
exception occurs, all decommissioning ceases until the NN is restarted.  Per 
javadoc for {{executor#scheduleAtFixedRate}}: *If any execution of the task 
encounters an exception, subsequent executions are suppressed*.  The monitor 
thread is alive but blocked waiting for an executor task that will never come.  
The code currently disposes of the future so the actual exception that aborted 
the task is gone.

Failover is insufficient since the task is also likely dead on the standby.  
Replication queue init after the transition to active will fix the under 
replication of blocks on currently decommissioning nodes but future nodes never 
decommission.  The standby must be bounced prior to failover – and hopefully 
the error condition does not reoccur.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Created] (HDFS-12704) FBR may corrupt block state

2017-10-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12704:
--

 Summary: FBR may corrupt block state
 Key: HDFS-12704
 URL: https://issues.apache.org/jira/browse/HDFS-12704
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: namenode
Affects Versions: 2.8.0
Reporter: Daryn Sharp
Priority: Critical


If FBR processing generates a runtime exception it is believed to foul the 
block state and lead to unpredictable behavior.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Bibinchundatt
+1 (non-binding)

- Build from source 1.8.0_111
- Deployed on 3 node secure setup
- Ran few mapreduce jobs with multiple users.
- Verified basic resource localization
- Failover of Resource Manager.
- Log aggregation verification
- Sanity check of JHS

Thanks
Bibin
--
Bibin A Chundatt Bibin A Chundatt
M: +91-9742095715
E: bibin.chund...@huawei.com
2012实验室-印研IT&Cloud BU分部
2012 Laboratories-IT&Cloud BU Branch Dept.

> On 20/10/17, 6:12 AM, "Junping Du"  wrote:
>
> >Hi folks,
> > I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.
> >
> > Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> and will be the latest stable/production release for Apache Hadoop - it
> includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> blocker/critical issues.
> >
> >  More information about the 2.8.2 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >
> >  New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.2-RC1 du/hadoop-2.8.2-RC0>
> >
> >  The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> is: 66c47f2a01ad9637879e95f80c41f798373828fb
> >
> >  The maven artifacts are available via repository.apache.org repository.apache.org/> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1064 repository.apache.org/content/repositories/orgapachehadoop-1062>
> >
> >  Please try the release and vote; the vote will run for the usual 5
> days, ending on 10/24/2017 6pm PST time.
> >
> >Thanks,
> >
> >Junping
> >
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HDFS-12705) WebHdfsFileSystem exceptions should retain the caused by exception

2017-10-24 Thread Daryn Sharp (JIRA)
Daryn Sharp created HDFS-12705:
--

 Summary: WebHdfsFileSystem exceptions should retain the caused by 
exception
 Key: HDFS-12705
 URL: https://issues.apache.org/jira/browse/HDFS-12705
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs
Affects Versions: 2.8.0
Reporter: Daryn Sharp


{{WebHdfsFileSystem#runWithRetry}} uses reflection to prepend the remote host 
to the exception.  While it preserves the original stacktrace, it omits the 
original cause which complicates debugging.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Apache Hadoop qbt Report: trunk+JDK8 on Linux/x86

2017-10-24 Thread Apache Jenkins Server
For more details, see 
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/

[Oct 23, 2017 4:43:41 PM] (epayne) YARN-4163: Audit getQueueInfo and 
getApplications calls
[Oct 23, 2017 5:47:16 PM] (arp) HDFS-12683. DFSZKFailOverController re-order 
logic for logging
[Oct 23, 2017 6:12:06 PM] (naganarasimha_gr) HADOOP-14966. Handle JDK-8071638 
for hadoop-common. Contributed by Bibin
[Oct 23, 2017 8:20:46 PM] (arp) HDFS-12650. Use slf4j instead of log4j in 
LeaseManager. Contributed by
[Oct 23, 2017 8:51:19 PM] (templedf) YARN-7357. Several methods in
[Oct 23, 2017 10:24:34 PM] (weichiu) HDFS-12249. dfsadmin -metaSave to output 
maintenance mode blocks.
[Oct 23, 2017 10:32:25 PM] (jzhuge) Revert "HADOOP-14954. 
MetricsSystemImpl#init should increment refCount
[Oct 24, 2017 12:56:56 AM] (rkanter) YARN-7320. Duplicate LiteralByteStrings in
[Oct 24, 2017 2:46:09 AM] (yqlin) HDFS-12695. Add a link to HDFS router 
federation document in site.xml.




-1 overall


The following subsystems voted -1:
asflicense unit


The following subsystems voted -1 but
were configured to be filtered/ignored:
cc checkstyle javac javadoc pylint shellcheck shelldocs whitespace


The following subsystems are considered long running:
(runtime bigger than 1h  0m  0s)
unit


Specific tests:

Failed junit tests :

   hadoop.security.TestRaceWhenRelogin 
   hadoop.hdfs.TestDFSStripedOutputStreamWithFailure 
   hadoop.hdfs.server.datanode.TestDataNodeVolumeFailure 
   hadoop.hdfs.web.TestWebHdfsTimeouts 
   hadoop.yarn.server.nodemanager.TestNodeStatusUpdater 
   hadoop.yarn.server.nodemanager.scheduler.TestDistributedScheduler 
   hadoop.yarn.server.resourcemanager.TestRMAdminService 
   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFSLeafQueue 
   hadoop.yarn.server.resourcemanager.scheduler.fair.TestFairScheduler 
   hadoop.yarn.server.TestContainerManagerSecurity 

Timed out junit tests :

   
org.apache.hadoop.yarn.client.api.impl.TestOpportunisticContainerAllocationE2E 
   org.apache.hadoop.mapred.pipes.TestPipeApplication 
  

   cc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-compile-cc-root.txt
  [4.0K]

   javac:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-compile-javac-root.txt
  [284K]

   checkstyle:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-checkstyle-root.txt
  [17M]

   pylint:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-patch-pylint.txt
  [20K]

   shellcheck:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-patch-shellcheck.txt
  [20K]

   shelldocs:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-patch-shelldocs.txt
  [12K]

   whitespace:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/whitespace-eol.txt
  [8.5M]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/whitespace-tabs.txt
  [292K]

   javadoc:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/diff-javadoc-javadoc-root.txt
  [760K]

   unit:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt
  [148K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
  [400K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-nodemanager.txt
  [40K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-resourcemanager.txt
  [68K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-server_hadoop-yarn-server-tests.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-yarn-project_hadoop-yarn_hadoop-yarn-client.txt
  [12K]
   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-unit-hadoop-mapreduce-project_hadoop-mapreduce-client_hadoop-mapreduce-client-jobclient.txt
  [84K]

   asflicense:

   
https://builds.apache.org/job/hadoop-qbt-trunk-java8-linux-x86/567/artifact/out/patch-asflicense-problems.txt
  [4.0K]

Powered by Apache Yetus 0.6.0-SNAPSHOT   http://yetus.apache.org

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

[jira] [Created] (HDFS-12706) Allow overriding HADOOP_SHELL_EXECNAME

2017-10-24 Thread Arpit Agarwal (JIRA)
Arpit Agarwal created HDFS-12706:


 Summary: Allow overriding HADOOP_SHELL_EXECNAME
 Key: HDFS-12706
 URL: https://issues.apache.org/jira/browse/HDFS-12706
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Arpit Agarwal


Some Hadoop shell scripts infer their own name using this bit of shell magic:

{code}
 18 MYNAME="${BASH_SOURCE-$0}"
 19 HADOOP_SHELL_EXECNAME="${MYNAME##*/}"
{code}

e.g. see the 
[hdfs|https://github.com/apache/hadoop/blob/trunk/hadoop-hdfs-project/hadoop-hdfs/src/main/bin/hdfs#L18]
 script.

The inferred shell script name is later passed to _hadoop-functions.sh_ which 
uses it to construct the names of some environment variables. E.g. when 
invoking _hdfs datanode_, the options variable name is inferred as follows:
{code}
# HDFS + DATANODE + OPTS -> HDFS_DATANODE_OPTS
{code}

This works well if the calling script name is standard {{hdfs}} or {{yarn}}. If 
a distribution renames the script to something like foo.bar, , then the 
variable names will be inferred as {{FOO.BAR_DATANODE_OPTS}}. This is not a 
valid bash variable name.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Rakesh Radhakrishnan
Thanks Junping for getting this out.

+1 (non-binding)

* Built from source on CentOS 7.3.1611, jdk1.8.0_111
* Deployed 3 node cluster
* Ran some sample jobs
* Ran balancer
* Operate HDFS from command line: ls, put, dfsadmin etc
* HDFS Namenode UI looks good


Thanks,
Rakesh

On Fri, Oct 20, 2017 at 6:12 AM, Junping Du  wrote:

> Hi folks,
>  I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.
>
>  Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> and will be the latest stable/production release for Apache Hadoop - it
> includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> blocker/critical issues.
>
>   More information about the 2.8.2 release plan can be found here:
> https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
>
>   New RC is available at: http://home.apache.org/~
> junping_du/hadoop-2.8.2-RC1 du/hadoop-2.8.2-RC0>
>
>   The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> is: 66c47f2a01ad9637879e95f80c41f798373828fb
>
>   The maven artifacts are available via repository.apache.org repository.apache.org/> at: https://repository.apache.org/
> content/repositories/orgapachehadoop-1064 repository.apache.org/content/repositories/orgapachehadoop-1062>
>
>   Please try the release and vote; the vote will run for the usual 5
> days, ending on 10/24/2017 6pm PST time.
>
> Thanks,
>
> Junping
>
>


Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Ravi Prakash
Thanks for all your hard work Junping!

* Checked signature.
* Ran a sleep job.
* Checked NN File browser UI works.

+1 (binding)

Cheers
Ravi

On Tue, Oct 24, 2017 at 12:26 PM, Rakesh Radhakrishnan 
wrote:

> Thanks Junping for getting this out.
>
> +1 (non-binding)
>
> * Built from source on CentOS 7.3.1611, jdk1.8.0_111
> * Deployed 3 node cluster
> * Ran some sample jobs
> * Ran balancer
> * Operate HDFS from command line: ls, put, dfsadmin etc
> * HDFS Namenode UI looks good
>
>
> Thanks,
> Rakesh
>
> On Fri, Oct 20, 2017 at 6:12 AM, Junping Du  wrote:
>
> > Hi folks,
> >  I've created our new release candidate (RC1) for Apache Hadoop
> 2.8.2.
> >
> >  Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line
> > and will be the latest stable/production release for Apache Hadoop - it
> > includes 315 new fixed issues since 2.8.1 and 69 fixes are marked as
> > blocker/critical issues.
> >
> >   More information about the 2.8.2 release plan can be found here:
> > https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release
> >
> >   New RC is available at: http://home.apache.org/~
> > junping_du/hadoop-2.8.2-RC1 > du/hadoop-2.8.2-RC0>
> >
> >   The RC tag in git is: release-2.8.2-RC1, and the latest commit id
> > is: 66c47f2a01ad9637879e95f80c41f798373828fb
> >
> >   The maven artifacts are available via repository.apache.org
>  > repository.apache.org/> at: https://repository.apache.org/
> > content/repositories/orgapachehadoop-1064 > repository.apache.org/content/repositories/orgapachehadoop-1062>
> >
> >   Please try the release and vote; the vote will run for the usual 5
> > days, ending on 10/24/2017 6pm PST time.
> >
> > Thanks,
> >
> > Junping
> >
> >
>


[jira] [Resolved] (HDFS-12686) Erasure coding system policy state is not correctly saved and loaded during real cluster restart

2017-10-24 Thread Xiao Chen (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Chen resolved HDFS-12686.
--
Resolution: Duplicate

Since HDFS-12682 should be able to handle this, and it's pretty hard to split 
that work into 2 jiras due to protobuf changes, I'll resolve this as a dup.

Thanks Sammi for filing the jira and Wei-Chiu for checking!

> Erasure coding system policy state is not correctly saved and loaded during 
> real cluster restart
> 
>
> Key: HDFS-12686
> URL: https://issues.apache.org/jira/browse/HDFS-12686
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 3.0.0-beta1
>Reporter: SammiChen
>Assignee: SammiChen
>Priority: Blocker
>  Labels: hdfs-ec-3.0-must-do
>
> Inspired by HDFS-12682,  I found the system erasure coding policy state will  
> not  be correctly saved and loaded in a real cluster.  Through there are such 
> kind of unit tests and all are passed with MiniCluster. It's because the 
> MiniCluster keeps the same static system erasure coding policy object after 
> the NN restart operation. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Junping Du
Allen,
 Do we have any solid evidence to show the HDFS unit tests going through 
the roof are due to serious memory leak by HDFS? Normally, I don't expect 
memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
because of test or deployment issues. 
 Unless there is concrete evidence, my concern on seriously memory leak for 
HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have 
deployed 2.8 on large production environment for months. Non-serious memory 
leak (like forgetting to close stream in non-critical path, etc.) and other 
non-critical bugs always happens here and there that we have to live with.

Thanks,

Junping


From: Allen Wittenauer 
Sent: Tuesday, October 24, 2017 8:27 AM
To: Hadoop Common
Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer  
> wrote:
>
>
>
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that’s what 
> this “feels” like.
>
> Someone should verify if 2.8.2 has the same issues before a release goes out …


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn’t die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It’s also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight.

Anyway, at the point, branch-2.8 and higher are probably fubar’d. 
Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org



-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Sean Busbey
Just curious, Junping what would "solid evidence" look like? Is the
supposition here that the memory leak is within HDFS test code rather than
library runtime code? How would such a distinction be shown?

On Tue, Oct 24, 2017 at 4:06 PM, Junping Du  wrote:

> Allen,
>  Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
>  Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
>
> Thanks,
>
> Junping
>
> 
> From: Allen Wittenauer 
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> > On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
> wrote:
> >
> >
> >
> > With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that’s what this “feels” like.
> >
> > Someone should verify if 2.8.2 has the same issues before a release goes
> out …
>
>
> FWIW, I ran 2.8.2 last night and it has the same problems.
>
> Also: the node didn’t die!  Looking through the workspace (so the
> next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
> and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>
> It looks like my hunch is correct:  RAM in the HDFS unit tests are
> going through the roof.  It’s also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
>
> Anyway, at the point, branch-2.8 and higher are probably fubar’d.
> Additionally, I’ve filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
>
>
>
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>
>
> -
> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: common-dev-h...@hadoop.apache.org
>
>


-- 
busbey


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Junping Du
In general, the "solid evidence" of memory leak comes from analysis of 
heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
piece of code are leaking memory from the analysis.

Unfortunately, I cannot find any conclusion from previous comments and it even 
cannot tell which daemons/components of HDFS consumes unexpected high memory. 
Don't sounds like a solid bug report to me.



Thanks,?


Junping



From: Sean Busbey 
Sent: Tuesday, October 24, 2017 2:20 PM
To: Junping Du
Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Just curious, Junping what would "solid evidence" look like? Is the supposition 
here that the memory leak is within HDFS test code rather than library runtime 
code? How would such a distinction be shown?

On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
mailto:j...@hortonworks.com>> wrote:
Allen,
 Do we have any solid evidence to show the HDFS unit tests going through 
the roof are due to serious memory leak by HDFS? Normally, I don't expect 
memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
because of test or deployment issues.
 Unless there is concrete evidence, my concern on seriously memory leak for 
HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) have 
deployed 2.8 on large production environment for months. Non-serious memory 
leak (like forgetting to close stream in non-critical path, etc.) and other 
non-critical bugs always happens here and there that we have to live with.

Thanks,

Junping


From: Allen Wittenauer 
mailto:a...@effectivemachines.com>>
Sent: Tuesday, October 24, 2017 8:27 AM
To: Hadoop Common
Cc: Hdfs-dev; 
mapreduce-...@hadoop.apache.org; 
yarn-...@hadoop.apache.org
Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
> mailto:a...@effectivemachines.com>> wrote:
>
>
>
> With no other information or access to go on, my current hunch is that one of 
> the HDFS unit tests is ballooning in memory size.  The easiest way to kill a 
> Linux machine is to eat all of the RAM, thanks to overcommit and that's what 
> this "feels" like.
>
> Someone should verify if 2.8.2 has the same issues before a release goes out 
> ...


FWIW, I ran 2.8.2 last night and it has the same problems.

Also: the node didn't die!  Looking through the workspace (so the next 
run will destroy them), two sets of logs stand out:

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt

and

https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/

It looks like my hunch is correct:  RAM in the HDFS unit tests are 
going through the roof.  It's also interesting how MANY log files there are.  
Is surefire not picking up that jobs are dying?  Maybe not if memory is getting 
tight.

Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers 
can have their RAM limits set in order to prevent more nodes going catatonic.



-
To unsubscribe, e-mail: 
yarn-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
yarn-dev-h...@hadoop.apache.org



-
To unsubscribe, e-mail: 
common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: 
common-dev-h...@hadoop.apache.org




--
busbey


Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Chris Douglas
Sean/Junping-

Ignoring the epistemology, it's a problem. Let's figure out what's
causing memory to balloon and then we can work out the appropriate
remedy.

Is this reproducible outside the CI environment? To Junping's point,
would YETUS-561 provide more detailed information to aid debugging? -C

On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
> In general, the "solid evidence" of memory leak comes from analysis of 
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
> piece of code are leaking memory from the analysis.
>
> Unfortunately, I cannot find any conclusion from previous comments and it 
> even cannot tell which daemons/components of HDFS consumes unexpected high 
> memory. Don't sounds like a solid bug report to me.
>
>
>
> Thanks,?
>
>
> Junping
>
>
> 
> From: Sean Busbey 
> Sent: Tuesday, October 24, 2017 2:20 PM
> To: Junping Du
> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
> Just curious, Junping what would "solid evidence" look like? Is the 
> supposition here that the memory leak is within HDFS test code rather than 
> library runtime code? How would such a distinction be shown?
>
> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
> mailto:j...@hortonworks.com>> wrote:
> Allen,
>  Do we have any solid evidence to show the HDFS unit tests going through 
> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
> because of test or deployment issues.
>  Unless there is concrete evidence, my concern on seriously memory leak 
> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, etc.) 
> have deployed 2.8 on large production environment for months. Non-serious 
> memory leak (like forgetting to close stream in non-critical path, etc.) and 
> other non-critical bugs always happens here and there that we have to live 
> with.
>
> Thanks,
>
> Junping
>
> 
> From: Allen Wittenauer 
> mailto:a...@effectivemachines.com>>
> Sent: Tuesday, October 24, 2017 8:27 AM
> To: Hadoop Common
> Cc: Hdfs-dev; 
> mapreduce-...@hadoop.apache.org; 
> yarn-...@hadoop.apache.org
> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>
>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>> mailto:a...@effectivemachines.com>> wrote:
>>
>>
>>
>> With no other information or access to go on, my current hunch is that one 
>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>> that's what this "feels" like.
>>
>> Someone should verify if 2.8.2 has the same issues before a release goes out 
>> ...
>
>
> FWIW, I ran 2.8.2 last night and it has the same problems.
>
> Also: the node didn't die!  Looking through the workspace (so the 
> next run will destroy them), two sets of logs stand out:
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>
> and
>
> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>
> It looks like my hunch is correct:  RAM in the HDFS unit tests are 
> going through the roof.  It's also interesting how MANY log files there are.  
> Is surefire not picking up that jobs are dying?  Maybe not if memory is 
> getting tight.
>
> Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker containers 
> can have their RAM limits set in order to prevent more nodes going catatonic.
>
>
>
> -
> To unsubscribe, e-mail: 
> yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: 
> yarn-dev-h...@hadoop.apache.org
>
>
>
> -
> To unsubscribe, e-mail: 
> common-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: 
> common-dev-h...@hadoop.apache.org
>
>
>
>
> --
> busbey

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12502) nntop should support a category based on FilesInGetListingOps

2017-10-24 Thread Zhe Zhang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhe Zhang reopened HDFS-12502:
--

> nntop should support a category based on FilesInGetListingOps
> -
>
> Key: HDFS-12502
> URL: https://issues.apache.org/jira/browse/HDFS-12502
> Project: Hadoop HDFS
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Zhe Zhang
>Assignee: Zhe Zhang
> Fix For: 2.9.0, 2.8.3, 3.0.0, 3.1.0
>
> Attachments: HDFS-12502.00.patch, HDFS-12502.01.patch, 
> HDFS-12502.02.patch, HDFS-12502.03.patch, HDFS-12502.04.patch
>
>
> Large listing ops can oftentimes be the main contributor to NameNode 
> slowness. The aggregate cost of listing ops is proportional to the 
> {{FilesInGetListingOps}} rather than the number of listing ops. Therefore 
> it'd be very useful for nntop to support this category.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: [VOTE] Release Apache Hadoop 2.8.2 (RC1)

2017-10-24 Thread Eric Payne
+1 (binding)
Thanks a lot, Junping!
I built and installed the source on a 6-node pseudo cluster. I simple sleep and 
streaming jobs that exercised intra-queue and inter-queue preemption, and used 
user weights.
-Eric

  From: Junping Du 
 To: "common-...@hadoop.apache.org" ; 
"hdfs-dev@hadoop.apache.org" ; 
"mapreduce-...@hadoop.apache.org" ; 
"yarn-...@hadoop.apache.org"  
 Sent: Thursday, October 19, 2017 7:43 PM
 Subject: [VOTE] Release Apache Hadoop 2.8.2 (RC1)
   
Hi folks,
    I've created our new release candidate (RC1) for Apache Hadoop 2.8.2.

    Apache Hadoop 2.8.2 is the first stable release of Hadoop 2.8 line and will 
be the latest stable/production release for Apache Hadoop - it includes 315 new 
fixed issues since 2.8.1 and 69 fixes are marked as blocker/critical issues.

      More information about the 2.8.2 release plan can be found here: 
https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+2.8+Release

      New RC is available at: 
http://home.apache.org/~junping_du/hadoop-2.8.2-RC1

      The RC tag in git is: release-2.8.2-RC1, and the latest commit id is: 
66c47f2a01ad9637879e95f80c41f798373828fb

      The maven artifacts are available via 
repository.apache.org at: 
https://repository.apache.org/content/repositories/orgapachehadoop-1064

      Please try the release and vote; the vote will run for the usual 5 days, 
ending on 10/24/2017 6pm PST time.

Thanks,

Junping


   

[jira] [Created] (HDFS-12707) start-all script is missing ozone start

2017-10-24 Thread Bharat Viswanadham (JIRA)
Bharat Viswanadham created HDFS-12707:
-

 Summary: start-all script is missing ozone start
 Key: HDFS-12707
 URL: https://issues.apache.org/jira/browse/HDFS-12707
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Bharat Viswanadham


start-all script is missing ozone start



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

My plan is currently to:

*  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 
patch to test it out. 
* if the tests work, work on getting YETUS-561 committed to yetus master
* switch jobs back to ASF yetus master either post-YETUS-561 or without it if 
it doesn’t work
* go back to working on something else, regardless of the outcome


> On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> 
> Sean/Junping-
> 
> Ignoring the epistemology, it's a problem. Let's figure out what's
> causing memory to balloon and then we can work out the appropriate
> remedy.
> 
> Is this reproducible outside the CI environment? To Junping's point,
> would YETUS-561 provide more detailed information to aid debugging? -C
> 
> On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
>> In general, the "solid evidence" of memory leak comes from analysis of 
>> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which 
>> piece of code are leaking memory from the analysis.
>> 
>> Unfortunately, I cannot find any conclusion from previous comments and it 
>> even cannot tell which daemons/components of HDFS consumes unexpected high 
>> memory. Don't sounds like a solid bug report to me.
>> 
>> 
>> 
>> Thanks,?
>> 
>> 
>> Junping
>> 
>> 
>> 
>> From: Sean Busbey 
>> Sent: Tuesday, October 24, 2017 2:20 PM
>> To: Junping Du
>> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; 
>> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>> Just curious, Junping what would "solid evidence" look like? Is the 
>> supposition here that the memory leak is within HDFS test code rather than 
>> library runtime code? How would such a distinction be shown?
>> 
>> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du 
>> mailto:j...@hortonworks.com>> wrote:
>> Allen,
>> Do we have any solid evidence to show the HDFS unit tests going through 
>> the roof are due to serious memory leak by HDFS? Normally, I don't expect 
>> memory leak are identified in our UTs - mostly, it (test jvm gone) is just 
>> because of test or deployment issues.
>> Unless there is concrete evidence, my concern on seriously memory leak 
>> for HDFS on 2.8 is relatively low given some companies (Yahoo, Alibaba, 
>> etc.) have deployed 2.8 on large production environment for months. 
>> Non-serious memory leak (like forgetting to close stream in non-critical 
>> path, etc.) and other non-critical bugs always happens here and there that 
>> we have to live with.
>> 
>> Thanks,
>> 
>> Junping
>> 
>> 
>> From: Allen Wittenauer 
>> mailto:a...@effectivemachines.com>>
>> Sent: Tuesday, October 24, 2017 8:27 AM
>> To: Hadoop Common
>> Cc: Hdfs-dev; 
>> mapreduce-...@hadoop.apache.org; 
>> yarn-...@hadoop.apache.org
>> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
>> 
>>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer 
>>> mailto:a...@effectivemachines.com>> wrote:
>>> 
>>> 
>>> 
>>> With no other information or access to go on, my current hunch is that one 
>>> of the HDFS unit tests is ballooning in memory size.  The easiest way to 
>>> kill a Linux machine is to eat all of the RAM, thanks to overcommit and 
>>> that's what this "feels" like.
>>> 
>>> Someone should verify if 2.8.2 has the same issues before a release goes 
>>> out ...
>> 
>> 
>>FWIW, I ran 2.8.2 last night and it has the same problems.
>> 
>>Also: the node didn't die!  Looking through the workspace (so the 
>> next run will destroy them), two sets of logs stand out:
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
>> 
>>and
>> 
>> https://builds.apache.org/job/hadoop-qbt-branch2-java7-linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
>> 
>>It looks like my hunch is correct:  RAM in the HDFS unit tests are 
>> going through the roof.  It's also interesting how MANY log files there are. 
>>  Is surefire not picking up that jobs are dying?  Maybe not if memory is 
>> getting tight.
>> 
>>Anyway, at the point, branch-2.8 and higher are probably fubar'd. 
>> Additionally, I've filed YETUS-561 so that Yetus-controlled Docker 
>> containers can have their RAM limits set in order to prevent more nodes 
>> going catatonic.
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: 
>> yarn-dev-unsubscr...@hadoop.apache.org
>> For additional commands, e-mail: 
>> yarn-dev-h...@hadoop.apache.org
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: 
>> common-dev-unsubscr...@hadoop.apache.org

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Andrew Wang
FWIW we've been running branch-3.0 unit tests successfully internally,
though we have separate jobs for Common, HDFS, YARN, and MR. The failures
here are probably a property of running everything in the same JVM, which
I've found problematic in the past due to OOMs.

On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer 
wrote:

>
> My plan is currently to:
>
> *  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561
> patch to test it out.
> * if the tests work, work on getting YETUS-561 committed to yetus master
> * switch jobs back to ASF yetus master either post-YETUS-561 or without it
> if it doesn’t work
> * go back to working on something else, regardless of the outcome
>
>
> > On Oct 24, 2017, at 2:55 PM, Chris Douglas  wrote:
> >
> > Sean/Junping-
> >
> > Ignoring the epistemology, it's a problem. Let's figure out what's
> > causing memory to balloon and then we can work out the appropriate
> > remedy.
> >
> > Is this reproducible outside the CI environment? To Junping's point,
> > would YETUS-561 provide more detailed information to aid debugging? -C
> >
> > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du  wrote:
> >> In general, the "solid evidence" of memory leak comes from analysis of
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which
> piece of code are leaking memory from the analysis.
> >>
> >> Unfortunately, I cannot find any conclusion from previous comments and
> it even cannot tell which daemons/components of HDFS consumes unexpected
> high memory. Don't sounds like a solid bug report to me.
> >>
> >>
> >>
> >> Thanks,?
> >>
> >>
> >> Junping
> >>
> >>
> >> 
> >> From: Sean Busbey 
> >> Sent: Tuesday, October 24, 2017 2:20 PM
> >> To: Junping Du
> >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev;
> mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >> Just curious, Junping what would "solid evidence" look like? Is the
> supposition here that the memory leak is within HDFS test code rather than
> library runtime code? How would such a distinction be shown?
> >>
> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du  > wrote:
> >> Allen,
> >> Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
> >> Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> 
> >> From: Allen Wittenauer  a...@effectivemachines.com>>
> >> Sent: Tuesday, October 24, 2017 8:27 AM
> >> To: Hadoop Common
> >> Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org hadoop.apache.org>; yarn-...@hadoop.apache.org yarn-...@hadoop.apache.org>
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <
> a...@effectivemachines.com> wrote:
> >>>
> >>>
> >>>
> >>> With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that's what this "feels" like.
> >>>
> >>> Someone should verify if 2.8.2 has the same issues before a release
> goes out ...
> >>
> >>
> >>FWIW, I ran 2.8.2 last night and it has the same problems.
> >>
> >>Also: the node didn't die!  Looking through the workspace (so
> the next run will destroy them), two sets of logs stand out:
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> >>
> >>and
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
> >>
> >>It looks like my hunch is correct:  RAM in the HDFS unit tests
> are going through the roof.  It's also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
> >>
> >>Anyway, at the point, branch-2.8 and higher are probably
> fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
> >>
> >>
> >>
> >> --

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Allen Wittenauer

> On Oct 24, 2017, at 4:10 PM, Andrew Wang  wrote:
> 
> FWIW we've been running branch-3.0 unit tests successfully internally, though 
> we have separate jobs for Common, HDFS, YARN, and MR. The failures here are 
> probably a property of running everything in the same JVM, which I've found 
> problematic in the past due to OOMs.

Last time I looked, surefire was configured to launch unit tests in 
different JVMs.  But that might only be true in trunk.  Or maybe only for some 
of the subprojects.  
-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

2017-10-24 Thread Subramaniam V K
Allen, can we bump up the maven surefire heap size to max (if it already is
not) for the branch-2 nightly build and see if it helps?

Thanks,
Subru

On Tue, Oct 24, 2017 at 4:22 PM, Allen Wittenauer 
wrote:

>
> > On Oct 24, 2017, at 4:10 PM, Andrew Wang 
> wrote:
> >
> > FWIW we've been running branch-3.0 unit tests successfully internally,
> though we have separate jobs for Common, HDFS, YARN, and MR. The failures
> here are probably a property of running everything in the same JVM, which
> I've found problematic in the past due to OOMs.
>
> Last time I looked, surefire was configured to launch unit tests
> in different JVMs.  But that might only be true in trunk.  Or maybe only
> for some of the subprojects.
> -
> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org
> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org
>
>


[jira] [Created] (HDFS-12708) Fix hdfs haadmin usage

2017-10-24 Thread fang zhenyi (JIRA)
fang zhenyi created HDFS-12708:
--

 Summary: Fix  hdfs haadmin usage
 Key: HDFS-12708
 URL: https://issues.apache.org/jira/browse/HDFS-12708
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: fang zhenyi
Assignee: fang zhenyi
Priority: Minor
 Fix For: 3.1.0






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



[jira] [Reopened] (HDFS-12532) DN Reg can Fail when principal doesn't contain hostname and floatingIP is configured.

2017-10-24 Thread Brahma Reddy Battula (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-12532?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brahma Reddy Battula reopened HDFS-12532:
-
  Assignee: Brahma Reddy Battula

> DN Reg can Fail when principal doesn't contain hostname and floatingIP is 
> configured.
> -
>
> Key: HDFS-12532
> URL: https://issues.apache.org/jira/browse/HDFS-12532
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Brahma Reddy Battula
>Assignee: Brahma Reddy Battula
>
> Configure principal without hostname (i.e hdfs/had...@hadoop.com)
> Configure floatingIP
> Start Cluster.
> Here DN will fail to register as it can take IP which is not in "/etc/hosts".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org



Re: Use HAAdmin API

2017-10-24 Thread Mihir Monani
If you want to do failover of NameNode (or even kill NameNode process)
doing shell operation (like ./hdfs haadmin -getServiceState nn) is
mandatory from program.

For DataNode there is one function in DFSClient#datanodeReport which
provides list of LIVE Datanode.

To avoid shell operations of hdfs haadmin, i was looking for some function
which can provide Name of Active NameNode.

Is there anything implemented in Hadoop which provide Active NameNode?

On Fri, Oct 20, 2017 at 4:25 AM, Arpit Agarwal 
wrote:

> Mihir,
>
> HAAdmin is a private interface. Most of its functionality is exposed via
> the ‘hdfs haadmin’ command [1]. Will that work for you?
>
> 1. https://hadoop.apache.org/docs/r2.7.0/hadoop-project-
> dist/hadoop-hdfs/HDFSCommands.html#haadmin
>
>
>
>
> On 10/17/17, 4:28 AM, "Mihir Monani"  wrote:
>
> I wanted to write Chaos Action for HBase Chaos Monkey (something like
> RestartActiveMaster
>  src/test/java/org/apache/hadoop/hbase/chaos/actions/
> RestartActiveMasterAction.java>
>  )  which can trigger NN Failover.
>
> For that i was going through HAAdmin.java. Is there any way I can use
> function like HAAdmin#failover , HAAdmin#getServiceState from HAAdmin
> class.
>
> Can someone guide me how do i use them?
>
> Thanks,
> Mihir Monani
>
>
>