After further thought, I think following both of your suggestions- adding a
shutdown hook and making the threads non-daemon- may have the result I'm
looking for.  I'll check and see if there are other reasons not to use
daemon threads in our networking internals.  More generally though, what do
y'all think about having Spark shutdown or close RelationProviders once
they are not needed?  Seems to me that RelationProviders will often be
stateful objects with network and/or file resources.  I checked with the C*
Spark connector, and they jump through a bunch of hoops to handle this
issue, including shutdown hooks and a ref counted cache.

- Dan

On Wed, Mar 16, 2016 at 4:04 PM, Dan Burkert <d...@cloudera.com> wrote:

> Thanks for the replies, responses inline:
>
> On Wed, Mar 16, 2016 at 3:36 PM, Reynold Xin <r...@databricks.com> wrote:
>
>> There is no way to really know that, because users might run queries at
>> any given point.
>>
>> BTW why can't your threads be just daemon threads?
>>
>
> The bigger issue is that we require the Kudu client to be manually closed
> so that it can do necessary cleanup tasks.  During shutdown the client
> closes the non-daemon threads, but more importantly, it flushes any
> outstanding batched writes to the server.
>
> On Wed, Mar 16, 2016 at 3:35 PM, Hamel Kothari <hamelkoth...@gmail.com>
>  wrote:
>
>> Dan,
>>
>> You could probably just register a JVM shutdown hook yourself:
>> https://docs.oracle.com/javase/7/docs/api/java/lang/Runtime.html#addShutdownHook(java.lang.Thread
>> )
>>
>> This at least would let you close the connections when the application as
>> a whole has completed (in standalone) or when your executors have been
>> killed (in YARN). I think that's as close as you'll get to knowing when an
>> executor will no longer have any tasks in the current state of the world.
>>
>
> The Spark shell will not run shutdown hooks after a <ctrl>-D if there are
> non-daemon threads running.  You can test this with the following input to
> the shell:
>
> new Thread(new Runnable { override def run() = { while (true) {
> println("running"); Thread.sleep(10000) } } }).start()
> Runtime.getRuntime.addShutdownHook(new Thread(new Runnable { override def
> run() = println("shutdown fired") }))
>
> - Dan
>
>
>
>>
>> On Wed, Mar 16, 2016 at 3:29 PM, Dan Burkert <d...@cloudera.com> wrote:
>>
>>> Hi Reynold,
>>>
>>> Is there any way to know when an executor will no longer have any
>>> tasks?  It seems to me there is no timeout which is appropriate that is
>>> long enough to ensure that no more tasks will be scheduled on the executor,
>>> and short enough to be appropriate to wait on during an interactive shell
>>> shutdown.
>>>
>>> - Dan
>>>
>>> On Wed, Mar 16, 2016 at 2:40 PM, Reynold Xin <r...@databricks.com>
>>> wrote:
>>>
>>>> Maybe just add a watch dog thread and closed the connection upon some
>>>> timeout?
>>>>
>>>>
>>>> On Wednesday, March 16, 2016, Dan Burkert <d...@cloudera.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I'm working on the Spark connector for Apache Kudu, and I've run into
>>>>> an issue that is a bit beyond my Spark knowledge. The Kudu connector
>>>>> internally holds an open connection to the Kudu cluster
>>>>> <https://github.com/apache/incubator-kudu/blob/master/java/kudu-spark/src/main/scala/org/kududb/spark/KuduContext.scala#L37>
>>>>>  which
>>>>> internally holds a Netty context with non-daemon threads. When using the
>>>>> Spark shell with the Kudu connector, exiting the shell via <ctrl>-D causes
>>>>> the shell to hang, and a thread dump reveals it's waiting for these
>>>>> non-daemon threads.  Registering a JVM shutdown hook to close the Kudu
>>>>> client does not do the trick, as it seems that the shutdown hooks are not
>>>>> fired on <ctrl>-D.
>>>>>
>>>>> I see that there is an internal Spark API for handling shutdown
>>>>> <https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala>,
>>>>> is there something similar available for cleaning up external data 
>>>>> sources?
>>>>>
>>>>> - Dan
>>>>>
>>>>
>>>
>>
>

Reply via email to