FWIW we've been running branch-3.0 unit tests successfully internally, though we have separate jobs for Common, HDFS, YARN, and MR. The failures here are probably a property of running everything in the same JVM, which I've found problematic in the past due to OOMs.
On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer <a...@effectivemachines.com> wrote: > > My plan is currently to: > > * switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561 > patch to test it out. > * if the tests work, work on getting YETUS-561 committed to yetus master > * switch jobs back to ASF yetus master either post-YETUS-561 or without it > if it doesn’t work > * go back to working on something else, regardless of the outcome > > > > On Oct 24, 2017, at 2:55 PM, Chris Douglas <cdoug...@apache.org> wrote: > > > > Sean/Junping- > > > > Ignoring the epistemology, it's a problem. Let's figure out what's > > causing memory to balloon and then we can work out the appropriate > > remedy. > > > > Is this reproducible outside the CI environment? To Junping's point, > > would YETUS-561 provide more detailed information to aid debugging? -C > > > > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <j...@hortonworks.com> wrote: > >> In general, the "solid evidence" of memory leak comes from analysis of > heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which > piece of code are leaking memory from the analysis. > >> > >> Unfortunately, I cannot find any conclusion from previous comments and > it even cannot tell which daemons/components of HDFS consumes unexpected > high memory. Don't sounds like a solid bug report to me. > >> > >> > >> > >> Thanks,? > >> > >> > >> Junping > >> > >> > >> ________________________________ > >> From: Sean Busbey <bus...@cloudera.com> > >> Sent: Tuesday, October 24, 2017 2:20 PM > >> To: Junping Du > >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev; > mapreduce-...@hadoop.apache.org; yarn-...@hadoop.apache.org > >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > >> > >> Just curious, Junping what would "solid evidence" look like? Is the > supposition here that the memory leak is within HDFS test code rather than > library runtime code? How would such a distinction be shown? > >> > >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <j...@hortonworks.com > <mailto:j...@hortonworks.com>> wrote: > >> Allen, > >> Do we have any solid evidence to show the HDFS unit tests going > through the roof are due to serious memory leak by HDFS? Normally, I don't > expect memory leak are identified in our UTs - mostly, it (test jvm gone) > is just because of test or deployment issues. > >> Unless there is concrete evidence, my concern on seriously memory > leak for HDFS on 2.8 is relatively low given some companies (Yahoo, > Alibaba, etc.) have deployed 2.8 on large production environment for > months. Non-serious memory leak (like forgetting to close stream in > non-critical path, etc.) and other non-critical bugs always happens here > and there that we have to live with. > >> > >> Thanks, > >> > >> Junping > >> > >> ________________________________________ > >> From: Allen Wittenauer <a...@effectivemachines.com<mailto: > a...@effectivemachines.com>> > >> Sent: Tuesday, October 24, 2017 8:27 AM > >> To: Hadoop Common > >> Cc: Hdfs-dev; mapreduce-...@hadoop.apache.org<mailto:mapreduce-dev@ > hadoop.apache.org>; yarn-...@hadoop.apache.org<mailto: > yarn-...@hadoop.apache.org> > >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86 > >> > >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer < > a...@effectivemachines.com<mailto:a...@effectivemachines.com>> wrote: > >>> > >>> > >>> > >>> With no other information or access to go on, my current hunch is that > one of the HDFS unit tests is ballooning in memory size. The easiest way > to kill a Linux machine is to eat all of the RAM, thanks to overcommit and > that's what this "feels" like. > >>> > >>> Someone should verify if 2.8.2 has the same issues before a release > goes out ... > >> > >> > >> FWIW, I ran 2.8.2 last night and it has the same problems. > >> > >> Also: the node didn't die! Looking through the workspace (so > the next run will destroy them), two sets of logs stand out: > >> > >> https://builds.apache.org/job/hadoop-qbt-branch2-java7- > linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt > >> > >> and > >> > >> https://builds.apache.org/job/hadoop-qbt-branch2-java7- > linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/ > >> > >> It looks like my hunch is correct: RAM in the HDFS unit tests > are going through the roof. It's also interesting how MANY log files there > are. Is surefire not picking up that jobs are dying? Maybe not if memory > is getting tight. > >> > >> Anyway, at the point, branch-2.8 and higher are probably > fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker > containers can have their RAM limits set in order to prevent more nodes > going catatonic. > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: yarn-dev-unsubscr...@hadoop.apache.org<mailto: > yarn-dev-unsubscr...@hadoop.apache.org> > >> For additional commands, e-mail: yarn-dev-h...@hadoop.apache.org > <mailto:yarn-dev-h...@hadoop.apache.org> > >> > >> > >> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > <mailto:common-dev-unsubscr...@hadoop.apache.org> > >> For additional commands, e-mail: common-dev-h...@hadoop.apache.org > <mailto:common-dev-h...@hadoop.apache.org> > >> > >> > >> > >> > >> -- > >> busbey > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org > For additional commands, e-mail: common-dev-h...@hadoop.apache.org > >