Hi Marcelo,

I have dumped through jstack, and saw the ShutdownHookManager :
'''
"Thread-1" #19 prio=5 os_prio=0 tid=0x00007f9b6828e800 nid=0x77cb waiting
on condition [0x00007f9a123e3000]
   java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x00000005408a5420> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
        at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
        at
java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:350)
        at
org.apache.spark.scheduler.AsyncEventQueue.stop(AsyncEventQueue.scala:131)
        at
org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219)
        at
org.apache.spark.scheduler.LiveListenerBus$$anonfun$stop$1.apply(LiveListenerBus.scala:219)
        at scala.collection.Iterator$class.foreach(Iterator.scala:893)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
        at
scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
        at
org.apache.spark.scheduler.LiveListenerBus.stop(LiveListenerBus.scala:219)
        - locked <0x00000005400c8f40> (a
org.apache.spark.scheduler.LiveListenerBus)
        at
org.apache.spark.SparkContext$$anonfun$stop$6.apply$mcV$sp(SparkContext.scala:1915)
        at
org.apache.spark.util.Utils$.tryLogNonFatalError(Utils.scala:1357)
        at org.apache.spark.SparkContext.stop(SparkContext.scala:1914)
        at
org.apache.spark.SparkContext$$anonfun$2.apply$mcV$sp(SparkContext.scala:572)
        at
org.apache.spark.util.SparkShutdownHook.run(ShutdownHookManager.scala:216)
        at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1$$anonfun$apply$mcV$sp$1.apply(ShutdownHookManager.scala:188)
        at
org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1988)
        at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply$mcV$sp(ShutdownHookManager.scala:188)
        at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at
org.apache.spark.util.SparkShutdownHookManager$$anonfun$runAll$1.apply(ShutdownHookManager.scala:188)
        at scala.util.Try$.apply(Try.scala:192)
        at
org.apache.spark.util.SparkShutdownHookManager.runAll(ShutdownHookManager.scala:188)
        at
org.apache.spark.util.SparkShutdownHookManager$$anon$2.run(ShutdownHookManager.scala:178)
        at
org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
......

"main" #1 prio=5 os_prio=0 tid=0x00007f9d50020000 nid=0x6a25 in
Object.wait() [0x00007f9d58f69000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Thread.join(Thread.java:1249)
        - locked <0x00000005404fe248> (a
org.apache.hadoop.util.ShutdownHookManager$1)
        at java.lang.Thread.join(Thread.java:1323)
        at
java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
        at
java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
        at java.lang.Shutdown.runHooks(Shutdown.java:123)
        at java.lang.Shutdown.sequence(Shutdown.java:167)
        at java.lang.Shutdown.exit(Shutdown.java:212)
        - locked <0x00000005404938f0> (a java.lang.Class for
java.lang.Shutdown)
        at java.lang.Runtime.exit(Runtime.java:109)
        at java.lang.System.exit(System.java:971)
        at scala.sys.package$.exit(package.scala:40)
        at scala.sys.package$.exit(package.scala:33)
        at
actionmodel.ParallelAdvertiserBeaconModel$.main(ParallelAdvertiserBeaconModel.scala:253)
        at
actionmodel.ParallelAdvertiserBeaconModel.main(ParallelAdvertiserBeaconModel.scala)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at
org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
        at
org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
        at
org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
...
'''

What shall I do then? Thanks!


On Wed, Jan 16, 2019 at 1:15 PM Marcelo Vanzin <van...@cloudera.com> wrote:

> Those are daemon threads and not the cause of the problem. The main
> thread is waiting for the "org.apache.hadoop.util.ShutdownHookManager"
> thread, but I don't see that one in your list.
>
> On Wed, Jan 16, 2019 at 12:08 PM Pola Yao <pola....@gmail.com> wrote:
> >
> > Hi Marcelo,
> >
> > Thanks for your response.
> >
> > I have dumped the threads on the server where I submitted the spark
> application:
> >
> > '''
> > ...
> > "dispatcher-event-loop-2" #28 daemon prio=5 os_prio=0
> tid=0x00007f56cee0e000 nid=0x1cb6 waiting on condition [0x00007f5699811000]
> >    java.lang.Thread.State: WAITING (parking)
> > at sun.misc.Unsafe.park(Native Method)
> > - parking to wait for  <0x00000006400161b8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> > at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > "dispatcher-event-loop-1" #27 daemon prio=5 os_prio=0
> tid=0x00007f56cee0c800 nid=0x1cb5 waiting on condition [0x00007f5699912000]
> >    java.lang.Thread.State: WAITING (parking)
> > at sun.misc.Unsafe.park(Native Method)
> > - parking to wait for  <0x00000006400161b8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> > at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > "dispatcher-event-loop-0" #26 daemon prio=5 os_prio=0
> tid=0x00007f56cee0c000 nid=0x1cb4 waiting on condition [0x00007f569a120000]
> >    java.lang.Thread.State: WAITING (parking)
> > at sun.misc.Unsafe.park(Native Method)
> > - parking to wait for  <0x00000006400161b8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> > at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> > at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
> > at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> > at
> org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:215)
> > at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> >
> > "Service Thread" #20 daemon prio=9 os_prio=0 tid=0x00007f56cc12d800
> nid=0x1ca5 runnable [0x0000000000000000]
> >    java.lang.Thread.State: RUNNABLE
> >
> > "C1 CompilerThread14" #19 daemon prio=9 os_prio=0 tid=0x00007f56cc12a000
> nid=0x1ca4 waiting on condition [0x0000000000000000]
> >    java.lang.Thread.State: RUNNABLE
> > ...
> > "Finalizer" #3 daemon prio=8 os_prio=0 tid=0x00007f56cc0ce000 nid=0x1c93
> in Object.wait() [0x00007f56ab3f2000]
> >    java.lang.Thread.State: WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:143)
> > - locked <0x00000006400cd498> (a java.lang.ref.ReferenceQueue$Lock)
> > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:164)
> > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
> >
> > "Reference Handler" #2 daemon prio=10 os_prio=0 tid=0x00007f56cc0c9800
> nid=0x1c92 in Object.wait() [0x00007f55cfffe000]
> >    java.lang.Thread.State: WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at java.lang.Object.wait(Object.java:502)
> > at java.lang.ref.Reference.tryHandlePending(Reference.java:191)
> > - locked <0x00000006400a2660> (a java.lang.ref.Reference$Lock)
> > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:153)
> >
> > "main" #1 prio=5 os_prio=0 tid=0x00007f56cc021000 nid=0x1c74 in
> Object.wait() [0x00007f56d344c000]
> >    java.lang.Thread.State: WAITING (on object monitor)
> > at java.lang.Object.wait(Native Method)
> > at java.lang.Thread.join(Thread.java:1249)
> > - locked <0x000000064056f6a0> (a
> org.apache.hadoop.util.ShutdownHookManager$1)
> > at java.lang.Thread.join(Thread.java:1323)
> > at
> java.lang.ApplicationShutdownHooks.runHooks(ApplicationShutdownHooks.java:106)
> > at
> java.lang.ApplicationShutdownHooks$1.run(ApplicationShutdownHooks.java:46)
> > at java.lang.Shutdown.runHooks(Shutdown.java:123)
> > at java.lang.Shutdown.sequence(Shutdown.java:167)
> > at java.lang.Shutdown.exit(Shutdown.java:212)
> > - locked <0x00000006404e65b8> (a java.lang.Class for java.lang.Shutdown)
> > at java.lang.Runtime.exit(Runtime.java:109)
> > at java.lang.System.exit(System.java:971)
> > at scala.sys.package$.exit(package.scala:40)
> > at scala.sys.package$.exit(package.scala:33)
> > at
> actionmodel.ParallelAdvertiserBeaconModel$.main(ParallelAdvertiserBeaconModel.scala:252)
> > at
> actionmodel.ParallelAdvertiserBeaconModel.main(ParallelAdvertiserBeaconModel.scala)
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> > at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:498)
> > at
> org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
> > at
> org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
> > at
> org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
> > at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
> > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
> > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> >
> > "VM Thread" os_prio=0 tid=0x00007f56cc0c1800 nid=0x1c91 runnable
> > ...
> > '''
> >
> > I have no clear idea what went wrong. I did call  awaitTermination to
> terminate the thread pool. Or is there any way to force close all those
> 'WAITING' threads associated with my spark application?
> >
> > On Wed, Jan 16, 2019 at 8:31 AM Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >>
> >> If System.exit() doesn't work, you may have a bigger problem
> >> somewhere. Check your threads (using e.g. jstack) to see what's going
> >> on.
> >>
> >> On Wed, Jan 16, 2019 at 8:09 AM Pola Yao <pola....@gmail.com> wrote:
> >> >
> >> > Hi Marcelo,
> >> >
> >> > Thanks for your reply! It made sense to me. However, I've tried many
> ways to exit the spark (e.g., System.exit()), but failed. Is there an
> explicit way to shutdown all the alive threads in the spark application and
> then quit afterwards?
> >> >
> >> >
> >> > On Tue, Jan 15, 2019 at 2:38 PM Marcelo Vanzin <van...@cloudera.com>
> wrote:
> >> >>
> >> >> You should check the active threads in your app. Since your pool uses
> >> >> non-daemon threads, that will prevent the app from exiting.
> >> >>
> >> >> spark.stop() should have stopped the Spark jobs in other threads, at
> >> >> least. But if something is blocking one of those threads, or if
> >> >> something is creating a non-daemon thread that stays alive somewhere,
> >> >> you'll see that.
> >> >>
> >> >> Or you can force quit with sys.exit.
> >> >>
> >> >> On Tue, Jan 15, 2019 at 1:30 PM Pola Yao <pola....@gmail.com> wrote:
> >> >> >
> >> >> > I submitted a Spark job through ./spark-submit command, the code
> was executed successfully, however, the application got stuck when trying
> to quit spark.
> >> >> >
> >> >> > My code snippet:
> >> >> > '''
> >> >> > {
> >> >> >
> >> >> > val spark = SparkSession.builder.master(...).getOrCreate
> >> >> >
> >> >> > val pool = Executors.newFixedThreadPool(3)
> >> >> > implicit val xc = ExecutionContext.fromExecutorService(pool)
> >> >> > val taskList = List(train1, train2, train3)  // where train* is a
> Future function which wrapped up some data reading and feature engineering
> and machine learning steps
> >> >> > val results = Await.result(Future.sequence(taskList), 20 minutes)
> >> >> >
> >> >> > println("Shutting down pool and executor service")
> >> >> > pool.shutdown()
> >> >> > xc.shutdown()
> >> >> >
> >> >> > println("Exiting spark")
> >> >> > spark.stop()
> >> >> >
> >> >> > }
> >> >> > '''
> >> >> >
> >> >> > After I submitted the job, from terminal, I could see the code was
> executed and printing "Exiting spark", however, after printing that line,
> it never existed spark, just got stuck.
> >> >> >
> >> >> > Does any body know what the reason is? Or how to force quitting?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> >
> >> >>
> >> >>
> >> >> --
> >> >> Marcelo
> >>
> >>
> >>
> >> --
> >> Marcelo
>
>
>
> --
> Marcelo
>

Reply via email to