Did you get a thread dump? We have experienced similar problems during shuffle operations due to a deadlock in InetAddress. Specifically, look for a runnable thread at something like "java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)".
Our "solution" has been to put a timeout around the code that was sometimes causing it, until we can upgrade our os. On Wed, Jul 29, 2015 at 5:32 AM, Andy Zhao <[email protected]> wrote: > Hi guys, > > A job hanged about 16 hours when I run random forest algorithm, I don't > know > why that happened. > I use spark 1.4.0 on yarn and here is the code > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24047/1.png> > > and following picture is from spark ui > <http://apache-spark-user-list.1001560.n3.nabble.com/file/n24047/2.png> > Can anybody help? > > > Thanks > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Job-hang-when-running-random-forest-tp24047.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
