Thanks for the replies, responses inline:

On Wed, Mar 16, 2016 at 3:36 PM, Reynold Xin <r...@databricks.com> wrote:

> There is no way to really know that, because users might run queries at
> any given point.
>
> BTW why can't your threads be just daemon threads?
>

The bigger issue is that we require the Kudu client to be manually closed
so that it can do necessary cleanup tasks.  During shutdown the client
closes the non-daemon threads, but more importantly, it flushes any
outstanding batched writes to the server.

On Wed, Mar 16, 2016 at 3:35 PM, Hamel Kothari <hamelkoth...@gmail.com>
 wrote:

> Dan,
>
> You could probably just register a JVM shutdown hook yourself:
> https://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#addShutdownHook(java.lang.Thread
> )
>
> This at least would let you close the connections when the application as
> a whole has completed (in standalone) or when your executors have been
> killed (in YARN). I think that's as close as you'll get to knowing when an
> executor will no longer have any tasks in the current state of the world.
>

The Spark shell will not run shutdown hooks after a <ctrl>-D if there are
non-daemon threads running.  You can test this with the following input to
the shell:

new Thread(new Runnable { override def run() = { while (true) {
println("running"); Thread.sleep(10000) } } }).start()
Runtime.getRuntime.addShutdownHook(new Thread(new Runnable { override def
run() = println("shutdown fired") }))

- Dan



>
> On Wed, Mar 16, 2016 at 3:29 PM, Dan Burkert <d...@cloudera.com> wrote:
>
>> Hi Reynold,
>>
>> Is there any way to know when an executor will no longer have any tasks?
>> It seems to me there is no timeout which is appropriate that is long enough
>> to ensure that no more tasks will be scheduled on the executor, and short
>> enough to be appropriate to wait on during an interactive shell shutdown.
>>
>> - Dan
>>
>> On Wed, Mar 16, 2016 at 2:40 PM, Reynold Xin <r...@databricks.com> wrote:
>>
>>> Maybe just add a watch dog thread and closed the connection upon some
>>> timeout?
>>>
>>>
>>> On Wednesday, March 16, 2016, Dan Burkert <d...@cloudera.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I'm working on the Spark connector for Apache Kudu, and I've run into
>>>> an issue that is a bit beyond my Spark knowledge. The Kudu connector
>>>> internally holds an open connection to the Kudu cluster
>>>> <https://github.com/apache/incubator-kudu/blob/master/java/kudu-spark/src/main/scala/org/kududb/spark/KuduContext.scala#L37>
>>>>  which
>>>> internally holds a Netty context with non-daemon threads. When using the
>>>> Spark shell with the Kudu connector, exiting the shell via <ctrl>-D causes
>>>> the shell to hang, and a thread dump reveals it's waiting for these
>>>> non-daemon threads.  Registering a JVM shutdown hook to close the Kudu
>>>> client does not do the trick, as it seems that the shutdown hooks are not
>>>> fired on <ctrl>-D.
>>>>
>>>> I see that there is an internal Spark API for handling shutdown
>>>> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala>,
>>>> is there something similar available for cleaning up external data sources?
>>>>
>>>> - Dan
>>>>
>>>
>>
>

Reply via email to