[
https://issues.apache.org/jira/browse/SOLR-6579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Noble Paul reassigned SOLR-6579:
--------------------------------
Assignee: Noble Paul (was: Shalin Shekhar Mangar)
> SnapPuller Replication blocks clean shutdown of tomcat
> ------------------------------------------------------
>
> Key: SOLR-6579
> URL: https://issues.apache.org/jira/browse/SOLR-6579
> Project: Solr
> Issue Type: Bug
> Affects Versions: 4.10.1
> Reporter: Philip Black-Knight
> Assignee: Noble Paul
> Attachments: cleanupSnapPullerFinally.patch
>
>
> main issue was described in the mailing list here:
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201409.mbox/browser
> and
> here:
> but also including the quotes:
> original message from Nick
> {quote}
> Hello,
> I have solr 4.10 running on tomcat 7. I'm doing replication from one master
> to about 10 slaves, with standard configuration:
> {code}
> <requestHandler name="/replication" class="solr.ReplicationHandler" >
> <lst name="master">
> <str name="enable">${enable.master:false}</str>
> <str name="replicateAfter">commit</str>
> <str name="replicateAfter">startup</str>
> <str name="confFiles">schema.xml,stopwords.txt</str>
> <!-- str name="commitReserveDuration">00:00:10</str -->
> </lst>
> <lst name="slave">
> <str name="enable">${enable.slave:false}</str>
> <str name="masterUrl">http://master:8080/solr/mycore</str>
> <str name="pollInterval">00:00:60</str>
> </lst>
> </requestHandler>
> {code}
> It appears that if tomcat gets shutdown while solr is replicating, it
> prevents tomcat from shutting down fully. Immediately after receiving the
> shutdown command, a thread dump is logged into catalina.out (this may have
> been turned on by some configuration someone else on my team made). I
> removed some threads that didn't look related, mostly about tomcat session
> replication, or with names like "http-bio-8080-exec-10".
> {code}
> 62252 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore –
> [mycore] webapp=/solr path=/replication
> params={command=details&_=1412014928648&wt=json} status=0 QTime=6
> 63310 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore –
> [mycore] webapp=/solr path=/replication
> params={command=details&_=1412014929699&wt=json} status=0 QTime=6
> 2014-09-29 14:22:10
> Full thread dump Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed
> mode):
> "fsyncService-12-thread-1" prio=10 tid=0x00007f3bd4002000 nid=0x203d
> waiting on condition [0x00007f3c271f0000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007e1ff4458> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "explicit-fetchindex-cmd" daemon prio=10 tid=0x00007f3c0413e800
> nid=0x203c runnable [0x00007f3c272f1000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at
> org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:198)
> at
> org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:174)
> at
> org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
> at
> org.apache.solr.common.util.FastInputStream.readWrappedStream(FastInputStream.java:80)
> at
> org.apache.solr.common.util.FastInputStream.read(FastInputStream.java:114)
> at
> org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:152)
> at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchPackets(SnapPuller.java:1239)
> at
> org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1187)
> at
> org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:774)
> at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:424)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
> at
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
> "process reaper" daemon prio=10 tid=0x00007f3c0409c000 nid=0x203b
> waiting on condition [0x00007f3c984e9000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007dfbfd890> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
> at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
> at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "process reaper" daemon prio=10 tid=0x00007f3c0409a000 nid=0x2039
> waiting on condition [0x00007f3cac040000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x00000007dfbfd890> (a
> java.util.concurrent.SynchronousQueue$TransferStack)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.SynchronousQueue$TransferStack.awaitFulfill(SynchronousQueue.java:460)
> at
> java.util.concurrent.SynchronousQueue$TransferStack.transfer(SynchronousQueue.java:359)
> at
> java.util.concurrent.SynchronousQueue.poll(SynchronousQueue.java:942)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "ajp-bio-8009-AsyncTimeout" daemon prio=10 tid=0x00007f3ce013c000
> nid=0x202e waiting on condition [0x00007f3c27af9000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
> at java.lang.Thread.run(Thread.java:745)
> "ajp-bio-8009-Acceptor-0" daemon prio=10 tid=0x00007f3ce013a800
> nid=0x202d runnable [0x00007f3c985ea000]
> java.lang.Thread.State: RUNNABLE
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
> at java.net.ServerSocket.implAccept(ServerSocket.java:530)
> at java.net.ServerSocket.accept(ServerSocket.java:498)
> at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
> at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
> at java.lang.Thread.run(Thread.java:745)
> "http-bio-8080-exec-1" daemon prio=10 tid=0x00007f3c0c001800 nid=0x202c
> runnable [0x00007f3c986eb000]
> java.lang.Thread.State: RUNNABLE
> at java.net.SocketInputStream.socketRead0(Native Method)
> at java.net.SocketInputStream.read(SocketInputStream.java:152)
> at java.net.SocketInputStream.read(SocketInputStream.java:122)
> at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:516)
> at
> org.apache.coyote.http11.InternalInputBuffer.fill(InternalInputBuffer.java:501)
> at
> org.apache.coyote.http11.Http11Processor.setRequestLineReadTimeout(Http11Processor.java:173)
> at
> org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:927)
> at
> org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
> at
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:312)
> - locked <0x00000007df0874b0> (a
> org.apache.tomcat.util.net.SocketWrapper)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "http-bio-8080-AsyncTimeout" daemon prio=10 tid=0x00007f3ce0139000
> nid=0x202b waiting on condition [0x00007f3cac1aa000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.tomcat.util.net.JIoEndpoint$AsyncTimeout.run(JIoEndpoint.java:148)
> at java.lang.Thread.run(Thread.java:745)
> "http-bio-8080-Acceptor-0" daemon prio=10 tid=0x00007f3ce0138000
> nid=0x202a runnable [0x00007f3cac2ab000]
> java.lang.Thread.State: RUNNABLE
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
> at java.net.ServerSocket.implAccept(ServerSocket.java:530)
> at java.net.ServerSocket.accept(ServerSocket.java:498)
> at
> org.apache.tomcat.util.net.DefaultServerSocketFactory.acceptSocket(DefaultServerSocketFactory.java:60)
> at
> org.apache.tomcat.util.net.JIoEndpoint$Acceptor.run(JIoEndpoint.java:216)
> at java.lang.Thread.run(Thread.java:745)
> "ContainerBackgroundProcessor[StandardEngine[Catalina]]" daemon prio=10
> tid=0x00007f3ce05b4000 nid=0x2029 waiting on condition [0x00007f3cacbb7000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.apache.catalina.core.ContainerBase$ContainerBackgroundProcessor.run(ContainerBase.java:1508)
> at java.lang.Thread.run(Thread.java:745)
> "clock:10000" daemon prio=10 tid=0x00007f3c5cafc000 nid=0x2028 in
> Object.wait() [0x00007f3c27bfa000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007dc6cf7e8> (a
> org.webmacro.util.Clock$ClockThread)
> at org.webmacro.util.Clock$ClockThread.run(Clock.java:108)
> - locked <0x00000007dc6cf7e8> (a
> org.webmacro.util.Clock$ClockThread)
> "TimeLoop(1000,600)" daemon prio=10 tid=0x00007f3c5cac6000 nid=0x2027
> waiting on condition [0x00007f3c27cfb000]
> java.lang.Thread.State: TIMED_WAITING (sleeping)
> at java.lang.Thread.sleep(Native Method)
> at
> org.webmacro.util.TimeLoop$TimeLoopThread.run(TimeLoop.java:237)
> "pool-1-thread-2" daemon prio=10 tid=0x00007f3c5d1b3800 nid=0x2022
> waiting on condition [0x00007f3c98246000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000070b90b4a0> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "Java2D Disposer" daemon prio=10 tid=0x00007f3c5cfeb800 nid=0x2021 in
> Object.wait() [0x00007f3c9835d000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007e6030140> (a
> java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> - locked <0x00000007e6030140> (a
> java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> at sun.java2d.Disposer.run(Disposer.java:145)
> at java.lang.Thread.run(Thread.java:745)
> "MySQL Statement Cancellation Timer" daemon prio=10
> tid=0x00007f3c5ca0c000 nid=0x2020 in Object.wait() [0x00007f3c9845e000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007e5f45ad8> (a java.util.TaskQueue)
> at java.lang.Object.wait(Object.java:503)
> at java.util.TimerThread.mainLoop(Timer.java:526)
> - locked <0x00000007e5f45ad8> (a java.util.TaskQueue)
> at java.util.TimerThread.run(Timer.java:505)
> "Timer-0" daemon prio=10 tid=0x00007f3c5c9f7800 nid=0x201f in
> Object.wait() [0x00007f3cad0bc000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x00000007e605a8d8> (a java.util.TaskQueue)
> at java.lang.Object.wait(Object.java:503)
> at java.util.TimerThread.mainLoop(Timer.java:526)
> - locked <0x00000007e605a8d8> (a java.util.TaskQueue)
> at java.util.TimerThread.run(Timer.java:505)
> "Thread-5" daemon prio=10 tid=0x00007f3c5c00f800 nid=0x201e in
> Object.wait() [0x00007f3cac3ac000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x000000070b92b230> (a java.lang.Object)
> at java.lang.Object.wait(Object.java:503)
> at org.apache.solr.core.CloserThread.run(CoreContainer.java:905)
> - locked <0x000000070b92b230> (a java.lang.Object)
> "snapPuller-10-thread-1" prio=10 tid=0x00007f3c6c26e800 nid=0x201d
> waiting on condition [0x00007f3cac555000]
> java.lang.Thread.State: TIMED_WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000070b92b2a8> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "searcherExecutor-6-thread-1" prio=10 tid=0x00007f3c6c1ea000 nid=0x201c
> waiting on condition [0x00007f3cac810000]
> java.lang.Thread.State: WAITING (parking)
> at sun.misc.Unsafe.park(Native Method)
> - parking to wait for <0x000000070b92b440> (a
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
> at
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
> at
> java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
> at
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> "GC Daemon" daemon prio=10 tid=0x00007f3ce0536000 nid=0x200d in
> Object.wait() [0x00007f3cd8137000]
> java.lang.Thread.State: TIMED_WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
> at sun.misc.GC$Daemon.run(GC.java:117)
> - locked <0x000000070bbee5b0> (a sun.misc.GC$LatencyLock)
> "Service Thread" daemon prio=10 tid=0x00007f3ce00a6000 nid=0x200b
> runnable [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread1" daemon prio=10 tid=0x00007f3ce00a4000 nid=0x200a
> waiting on condition [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
> "C2 CompilerThread0" daemon prio=10 tid=0x00007f3ce00a1000 nid=0x2009
> waiting on condition [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
> "Signal Dispatcher" daemon prio=10 tid=0x00007f3ce009f800 nid=0x2008
> waiting on condition [0x0000000000000000]
> java.lang.Thread.State: RUNNABLE
> "Finalizer" daemon prio=10 tid=0x00007f3ce0077800 nid=0x2007 in
> Object.wait() [0x00007f3cd8c1b000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x000000070bbee828> (a
> java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:135)
> - locked <0x000000070bbee828> (a
> java.lang.ref.ReferenceQueue$Lock)
> at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:151)
> at
> java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:209)
> "Reference Handler" daemon prio=10 tid=0x00007f3ce0075800 nid=0x2006 in
> Object.wait() [0x00007f3cadffe000]
> java.lang.Thread.State: WAITING (on object monitor)
> at java.lang.Object.wait(Native Method)
> - waiting on <0x000000070b8b23b8> (a
> java.lang.ref.Reference$Lock)
> at java.lang.Object.wait(Object.java:503)
> at
> java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
> - locked <0x000000070b8b23b8> (a java.lang.ref.Reference$Lock)
> "main" prio=10 tid=0x00007f3ce0009000 nid=0x1ffc runnable
> [0x00007f3ce675c000]
> java.lang.Thread.State: RUNNABLE
> at java.net.PlainSocketImpl.socketAccept(Native Method)
> at
> java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:398)
> at java.net.ServerSocket.implAccept(ServerSocket.java:530)
> at java.net.ServerSocket.accept(ServerSocket.java:498)
> at
> org.apache.catalina.core.StandardServer.await(StandardServer.java:452)
> at org.apache.catalina.startup.Catalina.await(Catalina.java:766)
> at org.apache.catalina.startup.Catalina.start(Catalina.java:712)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:322)
> at
> org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:456)
> JNI global references: 422
> Heap
> PSYoungGen total 1184256K, used 1031985K [0x00000007aaa80000,
> 0x0000000800000000, 0x0000000800000000)
> eden space 970240K, 94% used
> [0x00000007aaa80000,0x00000007e259e740,0x00000007e5e00000)
> from space 214016K, 55% used
> [0x00000007e5e00000,0x00000007ed2adfd0,0x00000007f2f00000)
> to space 212480K, 0% used
> [0x00000007f3080000,0x00000007f3080000,0x0000000800000000)
> ParOldGen total 1398272K, used 209681K [0x0000000700000000,
> 0x0000000755580000, 0x00000007aaa80000)
> object space 1398272K, 14% used
> [0x0000000700000000,0x000000070ccc4700,0x0000000755580000)
> PSPermGen total 78848K, used 78659K [0x00000006c0000000,
> 0x00000006c4d00000, 0x0000000700000000)
> object space 78848K, 99% used
> [0x00000006c0000000,0x00000006c4cd0ca8,0x00000006c4d00000)
> right after this, here comes the shutdown:
> Sep 29, 2014 2:22:10 PM org.apache.catalina.core.StandardServer await
> INFO: A valid shutdown command was received via the shutdown port.
> Stopping the Server instance.
> Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
> INFO: Pausing ProtocolHandler ["http-bio-8080"]
> Sep 29, 2014 2:22:10 PM org.apache.coyote.AbstractProtocol pause
> INFO: Pausing ProtocolHandler ["ajp-bio-8009"]
> Sep 29, 2014 2:22:11 PM org.apache.catalina.core.StandardService
> stopInternal
> INFO: Stopping service Catalina
> Sep 29, 2014 2:22:11 PM
> org.apache.catalina.ha.session.JvmRouteBinderValve stopInternal
> INFO: JvmRouteBinderValve stopped
> Sep 29, 2014 2:22:11 PM org.apache.catalina.ha.session.DeltaManager
> stopInternal
> INFO: Manager [localhost#/dy] expiring sessions upon shutdown
> (snip some stuff about an unrelated webapp, then the solr errors)
> 64876 [localhost-startStop-2] INFO org.apache.solr.core.CoreContainer
> – Shutting down CoreContainer instance=1359815787
> 64877 [localhost-startStop-2] INFO org.apache.solr.core.SolrCore –
> [mycore] CLOSING SolrCore org.apache.solr.core.SolrCore@1ade5d50
> 64882 [localhost-startStop-2] INFO
> org.apache.solr.update.UpdateHandler – closing
> DirectUpdateHandler2{commits=0,autocommit
> maxTime=15000ms,autocommits=0,soft
> autocommits=0,optimizes=0,rollbacks=0,expungeDeletes=0,docsPending=0,adds=0,deletesById=0,deletesByQuery=0,errors=0,cumulative_adds=0,cumulative_deletesById=0,cumulative_deletesByQuery=0,cumulative_errors=0,transaction_logs_total_size=0,transaction_logs_total_number=0}
> 64883 [localhost-startStop-2] INFO
> org.apache.solr.update.SolrCoreState – Closing SolrCoreState
> 64883 [localhost-startStop-2] INFO
> org.apache.solr.update.DefaultSolrCoreState – SolrCoreState ref count has
> reached 0 - closing IndexWriter
> 64883 [localhost-startStop-2] INFO
> org.apache.solr.update.DefaultSolrCoreState – closing IndexWriter with
> IndexWriterCloser
> 64885 [localhost-startStop-2] ERROR
> org.apache.solr.update.DefaultSolrCoreState – Error during shutdown of
> writer.
> org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
> closed
> at
> org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:700)
> at
> org.apache.lucene.index.IndexWriter.waitForMerges(IndexWriter.java:2384)
> at
> org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:795)
> at
> org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:70)
> at
> org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:371)
> at
> org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:72)
> at org.apache.solr.core.SolrCore.close(SolrCore.java:1073)
> at org.apache.solr.core.SolrCores.close(SolrCores.java:117)
> at
> org.apache.solr.core.CoreContainer.shutdown(CoreContainer.java:347)
> at
> org.apache.solr.servlet.SolrDispatchFilter.destroy(SolrDispatchFilter.java:200)
> at
> org.apache.catalina.core.ApplicationFilterConfig.release(ApplicationFilterConfig.java:315)
> at
> org.apache.catalina.core.StandardContext.filterStop(StandardContext.java:4782)
> at
> org.apache.catalina.core.StandardContext.stopInternal(StandardContext.java:5565)
> at
> org.apache.catalina.util.LifecycleBase.stop(LifecycleBase.java:232)
> at
> org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1575)
> at
> org.apache.catalina.core.ContainerBase$StopChild.call(ContainerBase.java:1564)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> 2014-09-29T14:22:11.384-0400: [GC 1299401K->384944K(2543104K),
> 0.0427280 secs]
> 64929 [localhost-startStop-2] INFO org.apache.solr.core.SolrCore –
> [mycore] Closing main searcher on request.
> 64935 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – Closing
> NRTCachingDirectoryFactory - 3 directories currently being tracked
> 69937 [explicit-fetchindex-cmd] INFO
> org.apache.solr.update.DefaultSolrCoreState – Creating new IndexWriter...
> 69938 [explicit-fetchindex-cmd] ERROR
> org.apache.solr.handler.ReplicationHandler – SnapPull failed
> :org.apache.lucene.store.AlreadyClosedException: Already closed
> at
> org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:341)
> at
> org.apache.solr.handler.ReplicationHandler.loadReplicationProperties(ReplicationHandler.java:827)
> at
> org.apache.solr.handler.SnapPuller.logReplicationTimeAndConfFiles(SnapPuller.java:569)
> at
> org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:511)
> at
> org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:337)
> at
> org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:227)
> 69942 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data;done=false>>]
> 69942 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data
> 69942 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index;done=false>>]
> 69942 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index
> 69942 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – looking to close
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
> [CachedDir<<refCount=0;path=/usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447;done=true>>]
> 69942 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – Closing directory:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
> 69943 [localhost-startStop-2] INFO
> org.apache.solr.core.CachingDirectoryFactory – Removing directory before
> core close:
> /usr/local/apache-tomcat-7.0.39/solr/mycore/data/index.20140929142203447
> Sep 29, 2014 2:22:16 PM org.apache.catalina.ha.session.DeltaManager
> stopInternal
> INFO: Manager [localhost#/solr] expiring sessions upon shutdown
> Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
> clearReferencesThreads
> SEVERE: The web application [/solr] appears to have started a thread
> named [fsyncService-12-thread-1] but has failed to stop it. This is very
> likely to create a memory leak.
> Sep 29, 2014 2:22:16 PM org.apache.catalina.loader.WebappClassLoader
> checkThreadLocalMapForLeaks
> SEVERE: The web application [/solr] created a ThreadLocal with key of
> type [org.apache.solr.schema.DateField.ThreadLocalDateFormat] (value
> [org.apache.solr.schema.DateField$ThreadLocalDateFormat@66271f62]) and a
> value of type [org.apache.solr.schema.DateField.ISO8601CanonicalDateFormat]
> (value [org.apache.solr.schema.DateField$ISO8601CanonicalDateFormat@6b2ed43a])
> but failed to remove it when the web application was stopped. Threads are
> going to be renewed over time to try and avoid a probable memory leak.
> Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
> INFO: Stopping ProtocolHandler ["http-bio-8080"]
> Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol stop
> INFO: Stopping ProtocolHandler ["ajp-bio-8009"]
> Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
> INFO: Destroying ProtocolHandler ["http-bio-8080"]
> Sep 29, 2014 2:22:16 PM org.apache.coyote.AbstractProtocol destroy
> INFO: Destroying ProtocolHandler ["ajp-bio-8009"]
> {code}
> At this point it looks like tomcat has shutdown, but the process is still
> running, as checked by ps. If I start a new tomcat and repeat this process
> a few times, we end up with several zombie instances of tomcat running.
> Is this designed behavior? What is supposed to happen when the container is
> shutdown while replication is running? How are others handling this?
> Please let me know if I can provide any more information.
> Thanks in advance,
> Nick
> {quote}
> My followup
> {quote}
> I was helping to look into this with Nick & I think we may have figured out
> the core of the problem...
> The problem is easily reproducible by starting replication on the slave and
> then sending a shutdown command to tomcat (e.g. catalina.sh stop).
> With a debugger attached, it looks like the fsyncService thread is blocking
> VM shutdown because it is created as a non-daemon thread.
> Essentially what seems to be happening is that the fsyncService thread is
> running when 'catalina.sh stop' is executed. This goes in and calls
> SnapPuller.destroy() which aborts the current sync. Around line 517 of the
> SnapPuller, there is code that is supposed to cleanup the fsyncService
> thread, but I don't think it is getting executed because the thread that
> called SnapPuller.fetchLatestIndex() is configured as a daemon Thread, so the
> JVM ends up shutting that down before it can cleanup the fysncService...
> So... it seems like:
> {code}
> if (fsyncService != null)
> ExecutorUtil.shutdownNowAndAwaitTermination(fsyncService);
> {code}
> could be added around line 1706 of SnapPuller.java, or
> {code}
> // since otherwise the JVM may kill the thread before any cleanup
> // code can be called
> puller.setDaemon(false);
> {code}
> could be added around line 230 of ReplicationHandler.java, however this needs
> some additional work (and I think it might need to be added regardless) since
> the cleanup code in SnapPuller(around 517) that shuts down the fsync thread
> never gets execute since logReplicationTimeAndConfFiles() can throw IO
> exceptions bypassing the rest of the finally block...So the call to
> {code}logReplicationTimeAndConfFiles(){code} around line 512 would need to
> get wrapped with a try/catch block to catch the IO exception...
> {quote}
> I will attach a patch as well that I believe fixes the issues. Is there a
> reason to *not* perform a hard shutdown of the fysnc thread? It seems sane
> enough to me to do a hard shutdown of the fsyncService if the action is
> aborted...
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]