[ https://issues.apache.org/jira/browse/SOLR-15254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17655114#comment-17655114 ]
Michael Gibney commented on SOLR-15254: --------------------------------------- This is good to know; thanks for fixing, and filling in the extra context! There's some relevant discussion on SOLR-16167, which also has attached a report from the intellij idea IDE -- catches some of these cases, but likely misses stuff that's not as straightforward to detect. > Random deadlock during cores initialization > ------------------------------------------- > > Key: SOLR-15254 > URL: https://issues.apache.org/jira/browse/SOLR-15254 > Project: Solr > Issue Type: Bug > Affects Versions: 8.5.1 > Environment: openjdk 11.0.9.1 > Reporter: Thomas Mortagne > Priority: Major > > At XWiki we are using Solr in an embbedded way with 5 cores, and sometimes we > end up with what appears to be a race condition leading to a deadlock between > several of the cores's configuration loading while loading some required > classes. > Here is an extract of a thread dump (also attached is a more complete version > of the thread dump) we get in such a case: > {noformat} > "coreLoadExecutor-9-thread-1" #41 prio=5 os_prio=0 cpu=305.06ms > elapsed=101.24s tid=0x00007fb5cfc8f800 nid=0x57a in Object.wait() > [0x00007fb5717ec000] > java.lang.Thread.State: RUNNABLE > at java.lang.Class.forName0(java.base@11.0.9.1/Native Method) > at java.lang.Class.forName(java.base@11.0.9.1/Class.java:398) > at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:554) > at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:639) > at > org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:339) > at > org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:332) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152) > at > org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:356) > at > org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90) > at > org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:48) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152) > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) > at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:178) > at > org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:105) > at > org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:178) > at > org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:46) > at > org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:145) > at > org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302) > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802) > at > org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown > Source) > at > org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > at > java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834) > "coreLoadExecutor-9-thread-3" #43 prio=5 os_prio=0 cpu=294.37ms > elapsed=101.24s tid=0x00007fb5cfcdf800 nid=0x57c in Object.wait() > [0x00007fb5715ea000] > java.lang.Thread.State: RUNNABLE > at java.lang.Class.forName0(java.base@11.0.9.1/Native Method) > at java.lang.Class.forName(java.base@11.0.9.1/Class.java:398) > at > org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:554) > at > org.apache.solr.core.SolrResourceLoader.newInstance(SolrResourceLoader.java:639) > at > org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:339) > at > org.apache.solr.schema.FieldTypePluginLoader$3.create(FieldTypePluginLoader.java:332) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152) > at > org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:356) > at > org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:90) > at > org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:48) > at > org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:152) > at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:502) > at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:178) > at > org.apache.solr.schema.ManagedIndexSchema.<init>(ManagedIndexSchema.java:105) > at > org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:178) > at > org.apache.solr.schema.ManagedIndexSchemaFactory.create(ManagedIndexSchemaFactory.java:46) > at > org.apache.solr.core.ConfigSetService.createIndexSchema(ConfigSetService.java:145) > at > org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:83) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302) > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802) > at > org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown > Source) > at > org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > at > java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834) > "coreLoadExecutor-9-thread-2" #42 prio=5 os_prio=0 cpu=388.81ms > elapsed=101.24s tid=0x00007fb5cfcde800 nid=0x57b in Object.wait() > [0x00007fb5716eb000] > java.lang.Thread.State: RUNNABLE > at > jdk.internal.misc.Unsafe.ensureClassInitialized0(java.base@11.0.9.1/Native > Method) > at > jdk.internal.misc.Unsafe.ensureClassInitialized(java.base@11.0.9.1/Unsafe.java:1042) > at > jdk.internal.reflect.UnsafeFieldAccessorFactory.newFieldAccessor(java.base@11.0.9.1/UnsafeFieldAccessorFactory.java:43) > at > jdk.internal.reflect.ReflectionFactory.newFieldAccessor(java.base@11.0.9.1/ReflectionFactory.java:186) > at > java.lang.reflect.Field.acquireFieldAccessor(java.base@11.0.9.1/Field.java:1105) > at > java.lang.reflect.Field.getFieldAccessor(java.base@11.0.9.1/Field.java:1086) > at java.lang.reflect.Field.get(java.base@11.0.9.1/Field.java:418) > at > org.apache.lucene.analysis.util.AbstractAnalysisFactory.lookupSPIName(AbstractAnalysisFactory.java:358) > at > org.apache.lucene.analysis.util.AnalysisSPILoader.reload(AnalysisSPILoader.java:90) > - locked <0x00000000e98e20a0> (a > org.apache.lucene.analysis.util.AnalysisSPILoader) > at > org.apache.lucene.analysis.util.AnalysisSPILoader.<init>(AnalysisSPILoader.java:66) > at > org.apache.lucene.analysis.util.AnalysisSPILoader.<init>(AnalysisSPILoader.java:52) > at > org.apache.lucene.analysis.util.TokenFilterFactory.<clinit>(TokenFilterFactory.java:33) > at > org.apache.solr.core.SolrResourceLoader.reloadLuceneSPI(SolrResourceLoader.java:256) > at > org.apache.solr.core.SolrResourceLoader.<init>(SolrResourceLoader.java:196) > at > org.apache.solr.core.ConfigSetService$Standalone.createCoreResourceLoader(ConfigSetService.java:207) > at > org.apache.solr.core.ConfigSetService.loadConfigSet(ConfigSetService.java:66) > at > org.apache.solr.core.CoreContainer.createFromDescriptor(CoreContainer.java:1302) > at > org.apache.solr.core.CoreContainer.lambda$load$13(CoreContainer.java:802) > at > org.apache.solr.core.CoreContainer$$Lambda$569/0x0000000840bc7040.call(Unknown > Source) > at > org.glowroot.agent.plugin.executor.CallableWrapper.call(CallableWrapper.java:38) > at > com.codahale.metrics.InstrumentedExecutorService$InstrumentedCallable.call(InstrumentedExecutorService.java:202) > at > java.util.concurrent.FutureTask.run(java.base@11.0.9.1/FutureTask.java:264) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834) > "main" #1 prio=5 os_prio=0 cpu=25108.09ms elapsed=133.50s > tid=0x00007fb5cc018800 nid=0x541 waiting on condition [0x00007fb5d2f73000] > java.lang.Thread.State: TIMED_WAITING (parking) > at jdk.internal.misc.Unsafe.park(java.base@11.0.9.1/Native Method) > - parking to wait for <0x00000000ea480000> (a > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) > at > java.util.concurrent.locks.LockSupport.parkNanos(java.base@11.0.9.1/LockSupport.java:234) > at > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(java.base@11.0.9.1/AbstractQueuedSynchronizer.java:2123) > at > java.util.concurrent.ThreadPoolExecutor.awaitTermination(java.base@11.0.9.1/ThreadPoolExecutor.java:1454) > at > com.codahale.metrics.InstrumentedExecutorService.awaitTermination(InstrumentedExecutorService.java:163) > at > org.apache.solr.common.util.ExecutorUtil.awaitTermination(ExecutorUtil.java:85) > at > org.apache.solr.common.util.ExecutorUtil.shutdownAndAwaitTermination(ExecutorUtil.java:77) > at org.apache.solr.core.CoreContainer.load(CoreContainer.java:843) > "h2sc-4-thread-1" #38 prio=5 os_prio=0 cpu=0.20ms elapsed=101.89s > tid=0x00007fb5cfc16800 nid=0x577 runnable [0x00007fb571cf0000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.EPoll.wait(java.base@11.0.9.1/Native Method) > at > sun.nio.ch.EPollSelectorImpl.doSelect(java.base@11.0.9.1/EPollSelectorImpl.java:120) > at > sun.nio.ch.SelectorImpl.lockAndDoSelect(java.base@11.0.9.1/SelectorImpl.java:124) > - locked <0x00000000e9819cf0> (a sun.nio.ch.Util$2) > - locked <0x00000000e9819c98> (a sun.nio.ch.EPollSelectorImpl) > at > sun.nio.ch.SelectorImpl.select(java.base@11.0.9.1/SelectorImpl.java:141) > at > org.eclipse.jetty.io.ManagedSelector.nioSelect(ManagedSelector.java:149) > at org.eclipse.jetty.io.ManagedSelector.select(ManagedSelector.java:156) > at > org.eclipse.jetty.io.ManagedSelector$SelectorProducer.select(ManagedSelector.java:572) > at > org.eclipse.jetty.io.ManagedSelector$SelectorProducer.produce(ManagedSelector.java:509) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produceTask(EatWhatYouKill.java:360) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:184) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171) > at > org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135) > at > org.eclipse.jetty.io.ManagedSelector$$Lambda$437/0x0000000840b28c40.run(Unknown > Source) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) > at > org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$438/0x0000000840b29440.run(Unknown > Source) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.9.1/ThreadPoolExecutor.java:1128 > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.9.1/ThreadPoolExecutor.java:628) > at java.lang.Thread.run(java.base@11.0.9.1/Thread.java:834) > {noformat} > Notice the coreLoadExecutor-9-thread-* thread which despite their "RUNNABLE" > state are actually internally blocked waiting on some unknown object. The > main thread is stuck because it's waiting for the cores's initialization to > finish. > I included "h2sc-4-thread-1" to have all the threads doing stuff but it does > not seems to have much to do with the deadlock (other than being a Solr thing > too). > It's not very easy to debug since we cannot always reproduce but maybe you > guys have a clue for where this issue might be comming from based on those > informations. > When we hit this problem we can "workaround" it by restarting a few times > until it pass the Solr init. > We never had this issue back when we only had 1 or 2 cores but having 5 of > those is very recent for us so I cannot really tell if it's some recent Solr > regression or an old bug we did not had before because we did not had enough > cores to trigger it. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org