Philip Black-Knight created SOLR-6579:
-----------------------------------------
Summary: SnapPuller Replication blocks clean shutdown of tomcat
Key: SOLR-6579
URL: https://issues.apache.org/jira/browse/SOLR-6579
Project: Solr
Issue Type: Bug
Affects Versions: 4.10.1
Reporter: Philip Black-Knight
main issue was described in the mailing list her:
http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/browser
and
here:
but also including the quotes:
original message from Nick
{quote}
Hello,
I have solr 4.10 running on tomcat 7. I'm doing replication from one master
to about 10 slaves, with standard configuration:
{code}
<requestHandler name="/replication" class="solr.ReplicationHandler" >
<lst name="master">
<str name="enable">${enable.master:false}</str>
<str name="replicateAfter">commit</str>
<str name="replicateAfter">startup</str>
<str name="confFiles">schema.xml,stopwords.txt</str>
<!-- str name="commitReserveDuration">00:00:10</str -->
</lst>
<lst name="slave">
<str name="enable">${enable.slave:false}</str>
<str name="masterUrl">http://master:8080/solr/mycore</str>
<str name="pollInterval">00:00:60</str>
</lst>
</requestHandler>
{code}
It appears that if tomcat gets shutdown while solr is replicating, it
prevents tomcat from shutting down fully. Immediately after receiving the
shutdown command, a thread dump is logged into catalina.out (this may have
been turned on by some configuration someone else on my team made). I
removed some threads that didn't look related, mostly about tomcat session
replication, or with names like "http-bio-8080-exec-10".
{code}
62252 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore –
[mycore] webapp=/solr path=/replication
params={command=details&_=1412014928648&wt=json} status=0 QTime=6
63310 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore –
[mycore] webapp=/solr path=/replication
params={command=details&_=1412014929699&wt=json} status=0 QTime=6
2014-09-29 14:22:10
Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed
mode):
"fsyncService-12-thread-1" prio=10 tid=0x00007f3bd4002000 nid=0x203d
waiting on condition [0x00007f3c271f0000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007e1ff4458> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"explicit-fetchindex-cmd" daemon prio=10 tid=0x00007f3c0413e800
nid=0x203c runnable [0x00007f3c272f1000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at
org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
at
org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:174)
at
org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at
org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
at
org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:114)
at
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1239)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1187)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:774)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:424)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
"process reaper" daemon prio=10 tid=0x00007f3c0409c000 nid=0x203b
waiting on condition [0x00007f3c984e9000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007dfbfd890> (a
java.util.concurrent.SynchronousQueue$TransferStack)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"process reaper" daemon prio=10 tid=0x00007f3c0409a000 nid=0x2039
waiting on condition [0x00007f3cac040000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x00000007dfbfd890> (a
java.util.concurrent.SynchronousQueue$TransferStack)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
at
java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
at
java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"ajp-bio-8009-AsyncTimeout" daemon prio=10 tid=0x00007f3ce013c000
nid=0x202e waiting on condition [0x00007f3c27af9000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
at java.lang.Thread.run(Thread.java:745)
"ajp-bio-8009-Acceptor-0" daemon prio=10 tid=0x00007f3ce013a800
nid=0x202d runnable [0x00007f3c985ea000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at
org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
at
org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
at java.lang.Thread.run(Thread.java:745)
"http-bio-8080-exec-1" daemon prio=10 tid=0x00007f3c0c001800 nid=0x202c
runnable [0x00007f3c986eb000]
java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:152)
at java.net.SocketInputStream.read(SocketInputStream.java:122)
at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:516)
at
org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:501)
at
org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:173)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:927)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
- locked <0x00000007df0874b0> (a
org.apache.tomcat.util.net.SocketWrapper)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"http-bio-8080-AsyncTimeout" daemon prio=10 tid=0x00007f3ce0139000
nid=0x202b waiting on condition [0x00007f3cac1aa000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
at java.lang.Thread.run(Thread.java:745)
"http-bio-8080-Acceptor-0" daemon prio=10 tid=0x00007f3ce0138000
nid=0x202a runnable [0x00007f3cac2ab000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at
org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
at
org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
at java.lang.Thread.run(Thread.java:745)
"ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10
tid=0x00007f3ce05b4000 nid=0x2029 waiting on condition [0x00007f3cacbb7000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1508)
at java.lang.Thread.run(Thread.java:745)
"clock:10000" daemon prio=10 tid=0x00007f3c5cafc000 nid=0x2028 in
Object.wait() [0x00007f3c27bfa000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007dc6cf7e8> (a
org.webmacro.util.Clock$ClockThread)
at org.webmacro.util.Clock$ClockThread.run(Clock.java:108)
- locked <0x00000007dc6cf7e8> (a
org.webmacro.util.Clock$ClockThread)
"TimeLoop(1000,600)" daemon prio=10 tid=0x00007f3c5cac6000 nid=0x2027
waiting on condition [0x00007f3c27cfb000]
java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at
org.webmacro.util.TimeLoop$TimeLoopThread.run(TimeLoop.java:237)
"pool-1-thread-2" daemon prio=10 tid=0x00007f3c5d1b3800 nid=0x2022
waiting on condition [0x00007f3c98246000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000070b90b4a0> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"Java2D Disposer" daemon prio=10 tid=0x00007f3c5cfeb800 nid=0x2021 in
Object.wait() [0x00007f3c9835d000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007e6030140> (a
java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x00000007e6030140> (a
java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at sun.java2d.Disposer.run(Disposer.java:145)
at java.lang.Thread.run(Thread.java:745)
"MySQL Statement Cancellation Timer" daemon prio=10
tid=0x00007f3c5ca0c000 nid=0x2020 in Object.wait() [0x00007f3c9845e000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007e5f45ad8> (a java.util.TaskQueue)
at java.lang.Object.wait(Object.java:503)
at java.util.TimerThread.mainLoop(Timer.java:526)
- locked <0x00000007e5f45ad8> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Timer-0" daemon prio=10 tid=0x00007f3c5c9f7800 nid=0x201f in
Object.wait() [0x00007f3cad0bc000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x00000007e605a8d8> (a java.util.TaskQueue)
at java.lang.Object.wait(Object.java:503)
at java.util.TimerThread.mainLoop(Timer.java:526)
- locked <0x00000007e605a8d8> (a java.util.TaskQueue)
at java.util.TimerThread.run(Timer.java:505)
"Thread-5" daemon prio=10 tid=0x00007f3c5c00f800 nid=0x201e in
Object.wait() [0x00007f3cac3ac000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000070b92b230> (a java.lang.Object)
at java.lang.Object.wait(Object.java:503)
at org.apache.solr.core.CloserThread.run(CoreContainer.java:905)
- locked <0x000000070b92b230> (a java.lang.Object)
"snapPuller-10-thread-1" prio=10 tid=0x00007f3c6c26e800 nid=0x201d
waiting on condition [0x00007f3cac555000]
java.lang.Thread.State: TIMED_WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000070b92b2a8> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
at
java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"searcherExecutor-6-thread-1" prio=10 tid=0x00007f3c6c1ea000 nid=0x201c
waiting on condition [0x00007f3cac810000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x000000070b92b440> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
"GC Daemon" daemon prio=10 tid=0x00007f3ce0536000 nid=0x200d in
Object.wait() [0x00007f3cd8137000]
java.lang.Thread.State: TIMED_WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
at sun.misc.GC$Daemon.run(GC.java:117)
- locked <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
"Service Thread" daemon prio=10 tid=0x00007f3ce00a6000 nid=0x200b
runnable [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread1" daemon prio=10 tid=0x00007f3ce00a4000 nid=0x200a
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread0" daemon prio=10 tid=0x00007f3ce00a1000 nid=0x2009
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Signal Dispatcher" daemon prio=10 tid=0x00007f3ce009f800 nid=0x2008
waiting on condition [0x0000000000000000]
java.lang.Thread.State: RUNNABLE
"Finalizer" daemon prio=10 tid=0x00007f3ce0077800 nid=0x2007 in
Object.wait() [0x00007f3cd8c1b000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000070bbee828> (a
java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
- locked <0x000000070bbee828> (a
java.lang.ref.ReferenceQueue$Lock)
at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
at
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
"Reference Handler" daemon prio=10 tid=0x00007f3ce0075800 nid=0x2006 in
Object.wait() [0x00007f3cadffe000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait(Native Method)
- waiting on <0x000000070b8b23b8> (a
java.lang.ref.Reference$Lock)
at java.lang.Object.wait(Object.java:503)
at
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
- locked <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock)
"main" prio=10 tid=0x00007f3ce0009000 nid=0x1ffc runnable
[0x00007f3ce675c000]
java.lang.Thread.State: RUNNABLE
at java.net.PlainSocketImpl.socketAccept(Native Method)
at
java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
at java.net.ServerSocket.implAccept(ServerSocket.java:530)
at java.net.ServerSocket.accept(ServerSocket.java:498)
at
org.apache.catalina.core.StandardServer.await(StandardServer.java:452)
at org.apache.catalina.startup.Catalina.await(Catalina.java:766)
at org.apache.catalina.startup.Catalina.start(Catalina.java:712)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
at
org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:456)
JNI global references: 422
Heap
PSYoungGen total 1184256K, used 1031985K [0x00000007aaa80000,
0x0000000800000000, 0x0000000800000000)
eden space 970240K, 94% used
[0x00000007aaa80000,0x00000007e259e740,0x00000007e5e00000)
from space 214016K, 55% used
[0x00000007e5e00000,0x00000007ed2adfd0,0x00000007f2f00000)
to space 212480K, 0% used
[0x00000007f3080000,0x00000007f3080000,0x0000000800000000)
ParOldGen total 1398272K, used 209681K [0x0000000700000000,
0x0000000755580000, 0x00000007aaa80000)
object space 1398272K, 14% used
[0x0000000700000000,0x000000070ccc4700,0x0000000755580000)
PSPermGen total 78848K, used 78659K [0x00000006c0000000,
0x00000006c4d00000, 0x0000000700000000)
object space 78848K, 99% used
[0x00000006c0000000,0x00000006c4cd0ca8,0x00000006c4d00000)
right after this, here comes the shutdown:
Sep 29, 2014 2:22:10 PM org.apache.catalina.core.StandardServer await
INFO: A valid shutdown command was received via the shutdown port.
Stopping the Server instance.
Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["http-bio-8080"]
Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
INFO: Pausing ProtocolHandler ["ajp-bio-8009"]
Sep 29, 2014 2:22:11 PM org.apache.catalina.core.StandardService
stopInternal
INFO: Stopping service Catalina
Sep 29, 2014 2:22:11 PM
org.apache.catalina.ha.session.JvmRouteBinderValve stopInternal
INFO: JvmRouteBinderValve stopped
Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.DeltaManager
stopInternal
INFO: Manager [localhost#/dy] expiring sessions upon shutdown
(snip some stuff about an unrelated webapp, then the solr errors)
64876 [localhost-startStop-2] INFO org.apache.solr.core.CoreContainer
– Shutting down CoreContainer instance=1359815787
64877 [localhost-startStop-2] INFO org.apache.solr.core.SolrCore –
[mycore] CLOSING SolrCore org.apache.solr.core.SolrCore@1ade5d50
64882 [localhost-startStop-2] INFO
org.apache.solr.update.UpdateHandler – closing
DirectUpdateHandler2{commits=0,autocommit
maxTime=15000ms,autocommits=0,soft
autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
64883 [localhost-startStop-2] INFO
org.apache.solr.update.SolrCoreState – Closing SolrCoreState
64883 [localhost-startStop-2] INFO
org.apache.solr.update.DefaultSolrCoreState – SolrCoreState ref count has
reached 0 - closing IndexWriter
64883 [localhost-startStop-2] INFO
org.apache.solr.update.DefaultSolrCoreState – closing IndexWriter with
IndexWriterCloser
64885 [localhost-startStop-2] ERROR
org.apache.solr.update.DefaultSolrCoreState – Error during shutdown of
writer.
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700)
at
org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2384)
at
org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:795)
at
org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:70)
at
org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:371)
at
org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:72)
at org.apache.solr.core.SolrCore.close(SolrCore.java:1073)
at org.apache.solr.core.SolrCores.close(SolrCores.java:117)
at
org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:347)
at
org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:200)
at
org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:315)
at
org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4782)
at
org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5565)
at
org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
at
org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575)
at
org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
2014-09-29T14:22:11.384-0400: [GC 1299401K->384944K(2543104K),
0.0427280 secs]
64929 [localhost-startStop-2] INFO org.apache.solr.core.SolrCore –
[mycore] Closing main searcher on request.
64935 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – Closing
NRTCachingDirectoryFactory - 3 directories currently being tracked
69937 [explicit-fetchindex-cmd] INFO
org.apache.solr.update.DefaultSolrCoreState – Creating new IndexWriter...
69938 [explicit-fetchindex-cmd] ERROR
org.apache.solr.handler.ReplicationHandler – SnapPull failed
:org.apache.lucene.store.AlreadyClosedException: Already closed
at
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:341)
at
org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:827)
at
org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:569)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:511)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
69942 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – looking to close
/usr/local/apache-tomcat-7.0.39/solr/mycore/data
[CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data;done=false>>]
69942 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – Closing directory:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data
69942 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – looking to close
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
[CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index;done=false>>]
69942 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – Closing directory:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
69942 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – looking to close
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
[CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447;done=true>>]
69942 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – Closing directory:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
69943 [localhost-startStop-2] INFO
org.apache.solr.core.CachingDirectoryFactory – Removing directory before
core close:
/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
Sep 29, 2014 2:22:16 PM org.apache.catalina.ha.session.DeltaManager
stopInternal
INFO: Manager [localhost#/solr] expiring sessions upon shutdown
Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
clearReferencesThreads
SEVERE: The web application [/solr] appears to have started a thread
named [fsyncService-12-thread-1] but has failed to stop it. This is very
likely to create a memory leak.
Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
checkThreadLocalMapForLeaks
SEVERE: The web application [/solr] created a ThreadLocal with key of
type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
[org.apache.solr.schema.DateField$ThreadLocalDateFormat@66271f62]) and a
value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
(value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
but failed to remove it when the web application was stopped. Threads are
going to be renewed over time to try and avoid a probable memory leak.
Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler ["http-bio-8080"]
Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
INFO: Stopping ProtocolHandler ["ajp-bio-8009"]
Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
INFO: Destroying ProtocolHandler ["http-bio-8080"]
Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
INFO: Destroying ProtocolHandler ["ajp-bio-8009"]
{code}
At this point it looks like tomcat has shutdown, but the process is still
running, as checked by ps. If I start a new tomcat and repeat this process
a few times, we end up with several zombie instances of tomcat running.
Is this designed behavior? What is supposed to happen when the container is
shutdown while replication is running? How are others handling this?
Please let me know if I can provide any more information.
Thanks in advance,
Nick
{quote}
My followup
{quote}
I was helping to look into this with Nick & I think we may have figured out the
core of the problem...
The problem is easily reproducible by starting replication on the slave and
then sending a shutdown command to tomcat (e.g. catalina.sh stop).
With a debugger attached, it looks like the fsyncService thread is blocking VM
shutdown because it is created as a non-daemon thread.
Essentially what seems to be happening is that the fsyncService thread is
running when 'catalina.sh stop' is executed. This goes in and calls
SnapPuller.destroy() which aborts the current sync. Around line 517 of the
SnapPuller, there is code that is supposed to cleanup the fsyncService thread,
but I don't think it is getting executed because the thread that called
SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the JVM ends
up shutting that down before it can cleanup the fysncService...
So... it seems like:
{code}
if (fsyncService != null)
ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
{code}
could be added around line 1706 of SnapPuller.java, or
{code}
// since otherwise the JVM may kill the thread before any cleanup
// code can be called
puller.setDaemon(false);
{code}
could be added around line 230 of ReplicationHandler.java, however this needs
some additional work (and I think it might need to be added regardless) since
the cleanup code in SnapPuller(around 517) that shuts down the fsync thread
never gets execute since logReplicationTimeAndConfFiles() can throw IO
exceptions bypassing the rest of the finally block...So the call to
{code}logReplicationTimeAndConfFiles(){code} around line 512 would need to get
wrapped with a try/catch block to catch the IO exception...
{quote}
I will attach a patch as well that I believe fixes the issues. Is there a
reason to *not* perform a hard shutdown of the fysnc thread? It seems sane
enough to me to do a hard shutdown of the fsyncService if the action is
aborted...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]