[ 
https://issues.apache.org/jira/browse/SOLR-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar reassigned SOLR-6579:
-------------------------------------------

    Assignee: Shalin Shekhar Mangar

> SnapPuller Replication blocks clean shutdown of tomcat
> ------------------------------------------------------
>
>                 Key: SOLR-6579
>                 URL: https://issues.apache.org/jira/browse/SOLR-6579
>             Project: Solr
>          Issue Type: Bug
>    Affects Versions: 4.10.1
>            Reporter: Philip Black-Knight
>            Assignee: Shalin Shekhar Mangar
>         Attachments: cleanupSnapPullerFinally.patch
>
>
> main issue was described in the mailing list her:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/browser
> and 
> here:
> but also including the quotes:
> original message from Nick
> {quote}
> Hello,
> I have solr 4.10 running on tomcat 7. I'm doing replication from one master
> to about 10 slaves, with standard configuration:
> {code}
>       <requestHandler name="/replication" class="solr.ReplicationHandler" >
>            <lst name="master">
>              <str name="enable">${enable.master:false}</str>
>              <str name="replicateAfter">commit</str>
>              <str name="replicateAfter">startup</str>
>              <str name="confFiles">schema.xml,stopwords.txt</str>
>              <!-- str name="commitReserveDuration">00:00:10</str -->
>            </lst>
>            <lst name="slave">
>              <str name="enable">${enable.slave:false}</str>
>              <str name="masterUrl">http://master:8080/solr/mycore</str>
>              <str name="pollInterval">00:00:60</str>
>            </lst>
>       </requestHandler>
> {code}
> It appears that if tomcat gets shutdown while solr is replicating, it
> prevents tomcat from shutting down fully. Immediately after receiving the
> shutdown command, a thread dump is logged into catalina.out (this may have
> been turned on by some configuration someone else on my team made). I
> removed some threads that didn't look related, mostly about tomcat session
> replication, or with names like "http-bio-8080-exec-10".
> {code}
>     62252 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
> [mycore] webapp=/solr path=/replication
> params={command=details&_=1412014928648&wt=json} status=0 QTime=6
>     63310 [http-bio-8080-exec-1] INFO  org.apache.solr.core.SolrCore  –
> [mycore] webapp=/solr path=/replication
> params={command=details&_=1412014929699&wt=json} status=0 QTime=6
>     2014-09-29 14:22:10
>     Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed
> mode):
>     "fsyncService-12-thread-1" prio=10 tid=0x00007f3bd4002000 nid=0x203d
> waiting on condition [0x00007f3c271f0000]
>        java.lang.Thread.State: WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x00000007e1ff4458> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>             at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "explicit-fetchindex-cmd" daemon prio=10 tid=0x00007f3c0413e800
> nid=0x203c runnable [0x00007f3c272f1000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.SocketInputStream.socketRead0(Native Method)
>             at java.net.SocketInputStream.read(SocketInputStream.java:152)
>             at java.net.SocketInputStream.read(SocketInputStream.java:122)
>             at
> org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
>             at
> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:174)
>             at
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
>             at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
>             at
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:114)
>             at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
>             at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1239)
>             at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1187)
>             at
> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:774)
>             at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:424)
>             at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
>             at
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
>     "process reaper" daemon prio=10 tid=0x00007f3c0409c000 nid=0x203b
> waiting on condition [0x00007f3c984e9000]
>        java.lang.Thread.State: TIMED_WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x00000007dfbfd890> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
>             at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>             at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "process reaper" daemon prio=10 tid=0x00007f3c0409a000 nid=0x2039
> waiting on condition [0x00007f3cac040000]
>        java.lang.Thread.State: TIMED_WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x00000007dfbfd890> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
>             at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
>             at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
>             at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "ajp-bio-8009-AsyncTimeout" daemon prio=10 tid=0x00007f3ce013c000
> nid=0x202e waiting on condition [0x00007f3c27af9000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
>             at java.lang.Thread.run(Thread.java:745)
>     "ajp-bio-8009-Acceptor-0" daemon prio=10 tid=0x00007f3ce013a800
> nid=0x202d runnable [0x00007f3c985ea000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.PlainSocketImpl.socketAccept(Native Method)
>             at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>             at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>             at java.net.ServerSocket.accept(ServerSocket.java:498)
>             at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>             at java.lang.Thread.run(Thread.java:745)
>     "http-bio-8080-exec-1" daemon prio=10 tid=0x00007f3c0c001800 nid=0x202c
> runnable [0x00007f3c986eb000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.SocketInputStream.socketRead0(Native Method)
>             at java.net.SocketInputStream.read(SocketInputStream.java:152)
>             at java.net.SocketInputStream.read(SocketInputStream.java:122)
>             at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:516)
>             at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:501)
>             at
> org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:173)
>             at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:927)
>             at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
>             - locked <0x00000007df0874b0> (a
> org.apache.tomcat.util.net.SocketWrapper)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "http-bio-8080-AsyncTimeout" daemon prio=10 tid=0x00007f3ce0139000
> nid=0x202b waiting on condition [0x00007f3cac1aa000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
>             at java.lang.Thread.run(Thread.java:745)
>     "http-bio-8080-Acceptor-0" daemon prio=10 tid=0x00007f3ce0138000
> nid=0x202a runnable [0x00007f3cac2ab000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.PlainSocketImpl.socketAccept(Native Method)
>             at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>             at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>             at java.net.ServerSocket.accept(ServerSocket.java:498)
>             at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
>             at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
>             at java.lang.Thread.run(Thread.java:745)
>     "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10
> tid=0x00007f3ce05b4000 nid=0x2029 waiting on condition [0x00007f3cacbb7000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1508)
>             at java.lang.Thread.run(Thread.java:745)
>     "clock:10000" daemon prio=10 tid=0x00007f3c5cafc000 nid=0x2028 in
> Object.wait() [0x00007f3c27bfa000]
>        java.lang.Thread.State: TIMED_WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007dc6cf7e8> (a
> org.webmacro.util.Clock$ClockThread)
>             at org.webmacro.util.Clock$ClockThread.run(Clock.java:108)
>             - locked <0x00000007dc6cf7e8> (a
> org.webmacro.util.Clock$ClockThread)
>     "TimeLoop(1000,600)" daemon prio=10 tid=0x00007f3c5cac6000 nid=0x2027
> waiting on condition [0x00007f3c27cfb000]
>        java.lang.Thread.State: TIMED_WAITING (sleeping)
>             at java.lang.Thread.sleep(Native Method)
>             at
> org.webmacro.util.TimeLoop$TimeLoopThread.run(TimeLoop.java:237)
>     "pool-1-thread-2" daemon prio=10 tid=0x00007f3c5d1b3800 nid=0x2022
> waiting on condition [0x00007f3c98246000]
>        java.lang.Thread.State: WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x000000070b90b4a0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>             at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "Java2D Disposer" daemon prio=10 tid=0x00007f3c5cfeb800 nid=0x2021 in
> Object.wait() [0x00007f3c9835d000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007e6030140> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>             - locked <0x00000007e6030140> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>             at sun.java2d.Disposer.run(Disposer.java:145)
>             at java.lang.Thread.run(Thread.java:745)
>     "MySQL Statement Cancellation Timer" daemon prio=10
> tid=0x00007f3c5ca0c000 nid=0x2020 in Object.wait() [0x00007f3c9845e000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007e5f45ad8> (a java.util.TaskQueue)
>             at java.lang.Object.wait(Object.java:503)
>             at java.util.TimerThread.mainLoop(Timer.java:526)
>             - locked <0x00000007e5f45ad8> (a java.util.TaskQueue)
>             at java.util.TimerThread.run(Timer.java:505)
>     "Timer-0" daemon prio=10 tid=0x00007f3c5c9f7800 nid=0x201f in
> Object.wait() [0x00007f3cad0bc000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x00000007e605a8d8> (a java.util.TaskQueue)
>             at java.lang.Object.wait(Object.java:503)
>             at java.util.TimerThread.mainLoop(Timer.java:526)
>             - locked <0x00000007e605a8d8> (a java.util.TaskQueue)
>             at java.util.TimerThread.run(Timer.java:505)
>     "Thread-5" daemon prio=10 tid=0x00007f3c5c00f800 nid=0x201e in
> Object.wait() [0x00007f3cac3ac000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070b92b230> (a java.lang.Object)
>             at java.lang.Object.wait(Object.java:503)
>             at org.apache.solr.core.CloserThread.run(CoreContainer.java:905)
>             - locked <0x000000070b92b230> (a java.lang.Object)
>     "snapPuller-10-thread-1" prio=10 tid=0x00007f3c6c26e800 nid=0x201d
> waiting on condition [0x00007f3cac555000]
>        java.lang.Thread.State: TIMED_WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x000000070b92b2a8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>             at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
>             at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "searcherExecutor-6-thread-1" prio=10 tid=0x00007f3c6c1ea000 nid=0x201c
> waiting on condition [0x00007f3cac810000]
>        java.lang.Thread.State: WAITING (parking)
>             at sun.misc.Unsafe.park(Native Method)
>             - parking to wait for  <0x000000070b92b440> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>             at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>             at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>             at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
>             at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     "GC Daemon" daemon prio=10 tid=0x00007f3ce0536000 nid=0x200d in
> Object.wait() [0x00007f3cd8137000]
>        java.lang.Thread.State: TIMED_WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
>             at sun.misc.GC$Daemon.run(GC.java:117)
>             - locked <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
>     "Service Thread" daemon prio=10 tid=0x00007f3ce00a6000 nid=0x200b
> runnable [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "C2 CompilerThread1" daemon prio=10 tid=0x00007f3ce00a4000 nid=0x200a
> waiting on condition [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "C2 CompilerThread0" daemon prio=10 tid=0x00007f3ce00a1000 nid=0x2009
> waiting on condition [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "Signal Dispatcher" daemon prio=10 tid=0x00007f3ce009f800 nid=0x2008
> waiting on condition [0x0000000000000000]
>        java.lang.Thread.State: RUNNABLE
>     "Finalizer" daemon prio=10 tid=0x00007f3ce0077800 nid=0x2007 in
> Object.wait() [0x00007f3cd8c1b000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070bbee828> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
>             - locked <0x000000070bbee828> (a
> java.lang.ref.ReferenceQueue$Lock)
>             at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
>             at
> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
>     "Reference Handler" daemon prio=10 tid=0x00007f3ce0075800 nid=0x2006 in
> Object.wait() [0x00007f3cadffe000]
>        java.lang.Thread.State: WAITING (on object monitor)
>             at java.lang.Object.wait(Native Method)
>             - waiting on <0x000000070b8b23b8> (a
> java.lang.ref.Reference$Lock)
>             at java.lang.Object.wait(Object.java:503)
>             at
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
>             - locked <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock)
>     "main" prio=10 tid=0x00007f3ce0009000 nid=0x1ffc runnable
> [0x00007f3ce675c000]
>        java.lang.Thread.State: RUNNABLE
>             at java.net.PlainSocketImpl.socketAccept(Native Method)
>             at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
>             at java.net.ServerSocket.implAccept(ServerSocket.java:530)
>             at java.net.ServerSocket.accept(ServerSocket.java:498)
>             at
> org.apache.catalina.core.StandardServer.await(StandardServer.java:452)
>             at org.apache.catalina.startup.Catalina.await(Catalina.java:766)
>             at org.apache.catalina.startup.Catalina.start(Catalina.java:712)
>             at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>             at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>             at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>             at java.lang.reflect.Method.invoke(Method.java:606)
>             at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
>             at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:456)
>     JNI global references: 422
>     Heap
>      PSYoungGen      total 1184256K, used 1031985K [0x00000007aaa80000,
> 0x0000000800000000, 0x0000000800000000)
>       eden space 970240K, 94% used
> [0x00000007aaa80000,0x00000007e259e740,0x00000007e5e00000)
>       from space 214016K, 55% used
> [0x00000007e5e00000,0x00000007ed2adfd0,0x00000007f2f00000)
>       to   space 212480K, 0% used
> [0x00000007f3080000,0x00000007f3080000,0x0000000800000000)
>      ParOldGen       total 1398272K, used 209681K [0x0000000700000000,
> 0x0000000755580000, 0x00000007aaa80000)
>       object space 1398272K, 14% used
> [0x0000000700000000,0x000000070ccc4700,0x0000000755580000)
>      PSPermGen       total 78848K, used 78659K [0x00000006c0000000,
> 0x00000006c4d00000, 0x0000000700000000)
>       object space 78848K, 99% used
> [0x00000006c0000000,0x00000006c4cd0ca8,0x00000006c4d00000)
> right after this, here comes the shutdown:
>     Sep 29, 2014 2:22:10 PM org.apache.catalina.core.StandardServer await
>     INFO: A valid shutdown command was received via the shutdown port.
> Stopping the Server instance.
>     Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
>     INFO: Pausing ProtocolHandler ["http-bio-8080"]
>     Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
>     INFO: Pausing ProtocolHandler ["ajp-bio-8009"]
>     Sep 29, 2014 2:22:11 PM org.apache.catalina.core.StandardService
> stopInternal
>     INFO: Stopping service Catalina
>     Sep 29, 2014 2:22:11 PM
> org.apache.catalina.ha.session.JvmRouteBinderValve stopInternal
>     INFO: JvmRouteBinderValve stopped
>     Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.DeltaManager
> stopInternal
>     INFO: Manager [localhost#/dy] expiring sessions upon shutdown
> (snip some stuff about an unrelated webapp, then the solr errors)
>     64876 [localhost-startStop-2] INFO  org.apache.solr.core.CoreContainer
>  – Shutting down CoreContainer instance=1359815787
>     64877 [localhost-startStop-2] INFO  org.apache.solr.core.SolrCore  –
> [mycore]  CLOSING SolrCore org.apache.solr.core.SolrCore@1ade5d50
>     64882 [localhost-startStop-2] INFO
>  org.apache.solr.update.UpdateHandler  – closing
> DirectUpdateHandler2{commits=0,autocommit
> maxTime=15000ms,autocommits=0,soft
> autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
>     64883 [localhost-startStop-2] INFO
>  org.apache.solr.update.SolrCoreState  – Closing SolrCoreState
>     64883 [localhost-startStop-2] INFO
>  org.apache.solr.update.DefaultSolrCoreState  – SolrCoreState ref count has
> reached 0 - closing IndexWriter
>     64883 [localhost-startStop-2] INFO
>  org.apache.solr.update.DefaultSolrCoreState  – closing IndexWriter with
> IndexWriterCloser
>     64885 [localhost-startStop-2] ERROR
> org.apache.solr.update.DefaultSolrCoreState  – Error during shutdown of
> writer.
>     org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
> closed
>             at
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700)
>             at
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2384)
>             at
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:795)
>             at
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:70)
>             at
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:371)
>             at
> org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:72)
>             at org.apache.solr.core.SolrCore.close(SolrCore.java:1073)
>             at org.apache.solr.core.SolrCores.close(SolrCores.java:117)
>             at
> org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:347)
>             at
> org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:200)
>             at
> org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:315)
>             at
> org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4782)
>             at
> org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5565)
>             at
> org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
>             at
> org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575)
>             at
> org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564)
>             at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>             at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>             at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>             at java.lang.Thread.run(Thread.java:745)
>     2014-09-29T14:22:11.384-0400: [GC 1299401K->384944K(2543104K),
> 0.0427280 secs]
>     64929 [localhost-startStop-2] INFO  org.apache.solr.core.SolrCore  –
> [mycore] Closing main searcher on request.
>     64935 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing
> NRTCachingDirectoryFactory - 3 directories currently being tracked
>     69937 [explicit-fetchindex-cmd] INFO
>  org.apache.solr.update.DefaultSolrCoreState  – Creating new IndexWriter...
>     69938 [explicit-fetchindex-cmd] ERROR
> org.apache.solr.handler.ReplicationHandler  – SnapPull failed
> :org.apache.lucene.store.AlreadyClosedException: Already closed
>             at
> org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:341)
>             at
> org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:827)
>             at
> org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:569)
>             at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:511)
>             at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
>             at
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data;done=false>>]
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index;done=false>>]
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447;done=true>>]
>     69942 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
>     69943 [localhost-startStop-2] INFO
>  org.apache.solr.core.CachingDirectoryFactory  – Removing directory before
> core close:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
>     Sep 29, 2014 2:22:16 PM org.apache.catalina.ha.session.DeltaManager
> stopInternal
>     INFO: Manager [localhost#/solr] expiring sessions upon shutdown
>     Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
> clearReferencesThreads
>     SEVERE: The web application [/solr] appears to have started a thread
> named [fsyncService-12-thread-1] but has failed to stop it. This is very
> likely to create a memory leak.
>     Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
> checkThreadLocalMapForLeaks
>     SEVERE: The web application [/solr] created a ThreadLocal with key of
> type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@66271f62]) and a
> value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
> (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
> but failed to remove it when the web application was stopped. Threads are
> going to be renewed over time to try and avoid a probable memory leak.
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
>     INFO: Stopping ProtocolHandler ["http-bio-8080"]
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
>     INFO: Stopping ProtocolHandler ["ajp-bio-8009"]
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
>     INFO: Destroying ProtocolHandler ["http-bio-8080"]
>     Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
>     INFO: Destroying ProtocolHandler ["ajp-bio-8009"]
> {code}
> At this point it looks like tomcat has shutdown, but the process is still
> running, as checked by ps. If I start a new tomcat and repeat this process
> a few times, we end up with several zombie instances of tomcat running.
> Is this designed behavior? What is supposed to happen when the container is
> shutdown while replication is running? How are others handling this?
> Please let me know if I can provide any more information.
> Thanks in advance,
> Nick
> {quote}
> My followup
> {quote}
> I was helping to look into this with Nick & I think we may have figured out 
> the core of the problem...
> The problem is easily reproducible by starting replication on the slave and 
> then sending a shutdown command to tomcat (e.g. catalina.sh stop).
> With a debugger attached, it looks like the fsyncService thread is blocking 
> VM shutdown because it is created as a non-daemon thread.
> Essentially what seems to be happening is that the fsyncService thread is 
> running when 'catalina.sh stop' is executed. This goes in and calls 
> SnapPuller.destroy() which aborts the current sync. Around line 517 of the 
> SnapPuller, there is code that is supposed to cleanup the fsyncService 
> thread, but I don't think it is getting executed because the thread that 
> called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the 
> JVM ends up shutting that down before it can cleanup the fysncService...
> So... it seems like:
> {code}
>     if (fsyncService != null) 
> ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
> {code}
> could be added around line 1706 of SnapPuller.java,  or
> {code}
>            // since otherwise the JVM may kill the thread before any cleanup
>            // code can be called
>           puller.setDaemon(false);
> {code}
> could be added around line 230 of ReplicationHandler.java, however this needs 
> some additional work (and I think it might need to be added regardless) since 
> the cleanup code in SnapPuller(around 517) that shuts down the fsync thread 
> never gets execute since logReplicationTimeAndConfFiles() can throw IO 
> exceptions bypassing the rest of the finally block...So the call to 
> {code}logReplicationTimeAndConfFiles(){code} around line 512 would need to 
> get wrapped with a try/catch block to catch the IO exception...
> {quote}
> I will attach a patch as well that I believe fixes the issues. Is there a 
> reason to *not* perform a hard shutdown of the fysnc thread? It seems sane 
> enough to me to do a hard shutdown of the fsyncService if the action is 
> aborted...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to