Thanks for the replies, responses inline: On Wed, Mar 16, 2016 at 3:36 PM, Reynold Xin <r...@databricks.com> wrote:
> There is no way to really know that, because users might run queries at > any given point. > > BTW why can't your threads be just daemon threads? > The bigger issue is that we require the Kudu client to be manually closed so that it can do necessary cleanup tasks. During shutdown the client closes the non-daemon threads, but more importantly, it flushes any outstanding batched writes to the server. On Wed, Mar 16, 2016 at 3:35 PM, Hamel Kothari <hamelkoth...@gmail.com> wrote: > Dan, > > You could probably just register a JVM shutdown hook yourself: > https://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#addShutdownHook(java.lang.Thread > ) > > This at least would let you close the connections when the application as > a whole has completed (in standalone) or when your executors have been > killed (in YARN). I think that's as close as you'll get to knowing when an > executor will no longer have any tasks in the current state of the world. > The Spark shell will not run shutdown hooks after a <ctrl>-D if there are non-daemon threads running. You can test this with the following input to the shell: new Thread(new Runnable { override def run() = { while (true) { println("running"); Thread.sleep(10000) } } }).start() Runtime.getRuntime.addShutdownHook(new Thread(new Runnable { override def run() = println("shutdown fired") })) - Dan > > On Wed, Mar 16, 2016 at 3:29 PM, Dan Burkert <d...@cloudera.com> wrote: > >> Hi Reynold, >> >> Is there any way to know when an executor will no longer have any tasks? >> It seems to me there is no timeout which is appropriate that is long enough >> to ensure that no more tasks will be scheduled on the executor, and short >> enough to be appropriate to wait on during an interactive shell shutdown. >> >> - Dan >> >> On Wed, Mar 16, 2016 at 2:40 PM, Reynold Xin <r...@databricks.com> wrote: >> >>> Maybe just add a watch dog thread and closed the connection upon some >>> timeout? >>> >>> >>> On Wednesday, March 16, 2016, Dan Burkert <d...@cloudera.com> wrote: >>> >>>> Hi all, >>>> >>>> I'm working on the Spark connector for Apache Kudu, and I've run into >>>> an issue that is a bit beyond my Spark knowledge. The Kudu connector >>>> internally holds an open connection to the Kudu cluster >>>> <https://github.com/apache/incubator-kudu/blob/master/java/kudu-spark/src/main/scala/org/kududb/spark/KuduContext.scala#L37> >>>> which >>>> internally holds a Netty context with non-daemon threads. When using the >>>> Spark shell with the Kudu connector, exiting the shell via <ctrl>-D causes >>>> the shell to hang, and a thread dump reveals it's waiting for these >>>> non-daemon threads. Registering a JVM shutdown hook to close the Kudu >>>> client does not do the trick, as it seems that the shutdown hooks are not >>>> fired on <ctrl>-D. >>>> >>>> I see that there is an internal Spark API for handling shutdown >>>> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala>, >>>> is there something similar available for cleaning up external data sources? >>>> >>>> - Dan >>>> >>> >> >