Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Andrew Wang Tue, 24 Oct 2017 16:11:21 -0700

FWIW we've been running branch-3.0 unit tests successfully internally,
though we have separate jobs for Common, HDFS, YARN, and MR. The failures
here are probably a property of running everything in the same JVM, which
I've found problematic in the past due to OOMs.


On Tue, Oct 24, 2017 at 4:04 PM, Allen Wittenauer <[email protected]>
wrote:

>
> My plan is currently to:
>
> *  switch some of Hadoop’s Yetus jobs over to my branch with the YETUS-561
> patch to test it out.
> * if the tests work, work on getting YETUS-561 committed to yetus master
> * switch jobs back to ASF yetus master either post-YETUS-561 or without it
> if it doesn’t work
> * go back to working on something else, regardless of the outcome
>
>
> > On Oct 24, 2017, at 2:55 PM, Chris Douglas <[email protected]> wrote:
> >
> > Sean/Junping-
> >
> > Ignoring the epistemology, it's a problem. Let's figure out what's
> > causing memory to balloon and then we can work out the appropriate
> > remedy.
> >
> > Is this reproducible outside the CI environment? To Junping's point,
> > would YETUS-561 provide more detailed information to aid debugging? -C
> >
> > On Tue, Oct 24, 2017 at 2:50 PM, Junping Du <[email protected]> wrote:
> >> In general, the "solid evidence" of memory leak comes from analysis of
> heapdump, jastack, gc log, etc. In many cases, we can locate/conclude which
> piece of code are leaking memory from the analysis.
> >>
> >> Unfortunately, I cannot find any conclusion from previous comments and
> it even cannot tell which daemons/components of HDFS consumes unexpected
> high memory. Don't sounds like a solid bug report to me.
> >>
> >>
> >>
> >> Thanks,?
> >>
> >>
> >> Junping
> >>
> >>
> >> ________________________________
> >> From: Sean Busbey <[email protected]>
> >> Sent: Tuesday, October 24, 2017 2:20 PM
> >> To: Junping Du
> >> Cc: Allen Wittenauer; Hadoop Common; Hdfs-dev;
> [email protected]; [email protected]
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >> Just curious, Junping what would "solid evidence" look like? Is the
> supposition here that the memory leak is within HDFS test code rather than
> library runtime code? How would such a distinction be shown?
> >>
> >> On Tue, Oct 24, 2017 at 4:06 PM, Junping Du <[email protected]
> <mailto:[email protected]>> wrote:
> >> Allen,
> >>     Do we have any solid evidence to show the HDFS unit tests going
> through the roof are due to serious memory leak by HDFS? Normally, I don't
> expect memory leak are identified in our UTs - mostly, it (test jvm gone)
> is just because of test or deployment issues.
> >>     Unless there is concrete evidence, my concern on seriously memory
> leak for HDFS on 2.8 is relatively low given some companies (Yahoo,
> Alibaba, etc.) have deployed 2.8 on large production environment for
> months. Non-serious memory leak (like forgetting to close stream in
> non-critical path, etc.) and other non-critical bugs always happens here
> and there that we have to live with.
> >>
> >> Thanks,
> >>
> >> Junping
> >>
> >> ________________________________________
> >> From: Allen Wittenauer <[email protected]<mailto:
> [email protected]>>
> >> Sent: Tuesday, October 24, 2017 8:27 AM
> >> To: Hadoop Common
> >> Cc: Hdfs-dev; [email protected]<mailto:mapreduce-dev@
> hadoop.apache.org>; [email protected]<mailto:
> [email protected]>
> >> Subject: Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86
> >>
> >>> On Oct 23, 2017, at 12:50 PM, Allen Wittenauer <
> [email protected]<mailto:[email protected]>> wrote:
> >>>
> >>>
> >>>
> >>> With no other information or access to go on, my current hunch is that
> one of the HDFS unit tests is ballooning in memory size.  The easiest way
> to kill a Linux machine is to eat all of the RAM, thanks to overcommit and
> that's what this "feels" like.
> >>>
> >>> Someone should verify if 2.8.2 has the same issues before a release
> goes out ...
> >>
> >>
> >>        FWIW, I ran 2.8.2 last night and it has the same problems.
> >>
> >>        Also: the node didn't die!  Looking through the workspace (so
> the next run will destroy them), two sets of logs stand out:
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/out/patch-unit-hadoop-hdfs-project_hadoop-hdfs.txt
> >>
> >>                                                        and
> >>
> >> https://builds.apache.org/job/hadoop-qbt-branch2-java7-
> linux-x86/ws/sourcedir/hadoop-hdfs-project/hadoop-hdfs/
> >>
> >>        It looks like my hunch is correct:  RAM in the HDFS unit tests
> are going through the roof.  It's also interesting how MANY log files there
> are.  Is surefire not picking up that jobs are dying?  Maybe not if memory
> is getting tight.
> >>
> >>        Anyway, at the point, branch-2.8 and higher are probably
> fubar'd. Additionally, I've filed YETUS-561 so that Yetus-controlled Docker
> containers can have their RAM limits set in order to prevent more nodes
> going catatonic.
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]<mailto:
> [email protected]>
> >> For additional commands, e-mail: [email protected]
> <mailto:[email protected]>
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [email protected]
> <mailto:[email protected]>
> >> For additional commands, e-mail: [email protected]
> <mailto:[email protected]>
> >>
> >>
> >>
> >>
> >> --
> >> busbey
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [email protected]
> > For additional commands, e-mail: [email protected]
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: Apache Hadoop qbt Report: branch2+JDK7 on Linux/x86

Reply via email to