[ https://issues.apache.org/jira/browse/SOLR-16043?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Kevin Risden updated SOLR-16043: -------------------------------- Description: This is caused by SOLR-15942. Some background: * originally discussed in 3.3.1 PR - https://github.com/apache/solr/pull/553#discussion_r792068327 * Failures so far from jenkins ** https://jenkins.thetaphi.de/job/Solr-main-Linux/3116/consoleText ** https://jenkins.thetaphi.de/job/Solr-main-Linux/3109/consoleText ** https://jenkins.thetaphi.de/job/Solr-main-MacOSX/678/consoleText It looks like it might be trying to shut down: {code:java} 2> 31835 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor encountered interrupt and exit. 2> 31835 WARN (BP-1491942626-127.0.0.1-1645493532707 heartbeating to localhost/127.0.0.1:51572) [] o.a.h.h.s.d.IncrementalBlockReportManager IncrementalBlockReportManager interrupted 2> 31835 WARN (Command processor) [] o.a.h.h.s.d.DataNode Ending command processor service for: Thread[Command processor,5,TGRP-HdfsBackupRepositoryIntegrationTest] {code} the error message is: {code:java} 2> SEVERE: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestHdfsBackupRestoreCore: 2> 1) Thread[id=874, name=Command processor, state=WAITING, group=TGRP-TestHdfsBackupRestoreCore] 2> at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method) 2> at java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) 2> at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505) 2> at java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137) 2> at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614) 2> at java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) 2> at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331) 2> at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315) 2> Feb 22, 2022 12:51:04 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll 2> INFO: Starting to interrupt leaked threads: 2> 1) Thread[id=874, name=Command processor, state=WAITING, group=TGRP-TestHdfsBackupRestoreCore] 2> 46271 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor encountered interrupt and exit. 2> 46272 WARN (Command processor) [] o.a.h.h.s.d.DataNode Ending command processor service for: Thread[Command processor,5,TGRP-TestHdfsBackupRestoreCore] 2> Feb 22, 2022 12:51:04 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll 2> INFO: All leaked threads terminated. > com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestHdfsBackupRestoreCore: > 1) Thread[id=874, name=Command processor, state=WAITING, group=TGRP-TestHdfsBackupRestoreCore] > at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method) > at java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) > at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505) > at java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137) > at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614) > at java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) > at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331) > at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315) > at __randomizedtesting.SeedInfo.seed([239C064719AD54A3]:0) 2> NOTE: test params are: codec=Asserting(Lucene90): {_root_=Lucene90, name=PostingsFormat(name=MockRandom), id=PostingsFormat(name=MockRandom)}, docValues:{}, maxPointsInLeafNode=2008, maxMBSortInHeap=7.425444076012971, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=bs-Cyrl-BA, timezone=America/Argentina/San_Luis 2> NOTE: Linux 5.11.0-41-generic amd64/Eclipse Foundation 16.0.2 (64-bit)/cpus=16,threads=132,free=90931432,total=259981312 2> NOTE: All tests run in this JVM: [HdfsCollectionsAPIDistributedZkTest, HDFSCollectionsAPITest, HdfsTlogReplayBufferedWhileIndexingTest, MoveReplicaHDFSTest, TestHdfsBackupRestoreCore] 2> NOTE: reproduce with: gradlew test --tests TestHdfsBackupRestoreCore -Dtests.seed=239C064719AD54A3 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=bs-Cyrl-BA -Dtests.timezone=America/Argentina/San_Luis -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {code} Some related code pointers: * https://github.com/apache/hadoop/blame/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java#L1336 * https://github.com/apache/hadoop/blame/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2017 PS: there is an error about datanode mbean - but it is unrelated as far as I can tell - https://issues.apache.org/jira/browse/HDFS-11041 was: This is caused by SOLR-15942. Some background: * originally discussed in 3.3.1 PR - https://github.com/apache/solr/pull/553#discussion_r792068327 * Failures so far from jenkins ** https://jenkins.thetaphi.de/job/Solr-main-Linux/3116/consoleText ** https://jenkins.thetaphi.de/job/Solr-main-Linux/3109/consoleText ** https://jenkins.thetaphi.de/job/Solr-main-MacOSX/678/consoleText It looks like it might be trying to shut down: {code:java} 2> 31835 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor encountered interrupt and exit. 2> 31835 WARN (BP-1491942626-127.0.0.1-1645493532707 heartbeating to localhost/127.0.0.1:51572) [] o.a.h.h.s.d.IncrementalBlockReportManager IncrementalBlockReportManager interrupted 2> 31835 WARN (Command processor) [] o.a.h.h.s.d.DataNode Ending command processor service for: Thread[Command processor,5,TGRP-HdfsBackupRepositoryIntegrationTest] {code} the error message is: {code:java} 2> SEVERE: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestHdfsBackupRestoreCore: 2> 1) Thread[id=874, name=Command processor, state=WAITING, group=TGRP-TestHdfsBackupRestoreCore] 2> at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method) 2> at java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) 2> at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505) 2> at java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137) 2> at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614) 2> at java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) 2> at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331) 2> at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315) 2> Feb 22, 2022 12:51:04 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll 2> INFO: Starting to interrupt leaked threads: 2> 1) Thread[id=874, name=Command processor, state=WAITING, group=TGRP-TestHdfsBackupRestoreCore] 2> 46271 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor encountered interrupt and exit. 2> 46272 WARN (Command processor) [] o.a.h.h.s.d.DataNode Ending command processor service for: Thread[Command processor,5,TGRP-TestHdfsBackupRestoreCore] 2> Feb 22, 2022 12:51:04 AM com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll 2> INFO: All leaked threads terminated. > com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked from SUITE scope at org.apache.solr.handler.TestHdfsBackupRestoreCore: > 1) Thread[id=874, name=Command processor, state=WAITING, group=TGRP-TestHdfsBackupRestoreCore] > at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method) > at java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) > at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505) > at java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137) > at java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614) > at java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) > at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331) > at app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315) > at __randomizedtesting.SeedInfo.seed([239C064719AD54A3]:0) 2> NOTE: test params are: codec=Asserting(Lucene90): {_root_=Lucene90, name=PostingsFormat(name=MockRandom), id=PostingsFormat(name=MockRandom)}, docValues:{}, maxPointsInLeafNode=2008, maxMBSortInHeap=7.425444076012971, sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=bs-Cyrl-BA, timezone=America/Argentina/San_Luis 2> NOTE: Linux 5.11.0-41-generic amd64/Eclipse Foundation 16.0.2 (64-bit)/cpus=16,threads=132,free=90931432,total=259981312 2> NOTE: All tests run in this JVM: [HdfsCollectionsAPIDistributedZkTest, HDFSCollectionsAPITest, HdfsTlogReplayBufferedWhileIndexingTest, MoveReplicaHDFSTest, TestHdfsBackupRestoreCore] 2> NOTE: reproduce with: gradlew test --tests TestHdfsBackupRestoreCore -Dtests.seed=239C064719AD54A3 -Dtests.multiplier=3 -Dtests.slow=true -Dtests.locale=bs-Cyrl-BA -Dtests.timezone=America/Argentina/San_Luis -Dtests.asserts=true -Dtests.file.encoding=UTF-8 {code} > HDFS tests - "Command processor" thread leak > -------------------------------------------- > > Key: SOLR-16043 > URL: https://issues.apache.org/jira/browse/SOLR-16043 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: hdfs, Tests > Reporter: Kevin Risden > Priority: Major > > This is caused by SOLR-15942. Some background: > * originally discussed in 3.3.1 PR - > https://github.com/apache/solr/pull/553#discussion_r792068327 > * Failures so far from jenkins > ** https://jenkins.thetaphi.de/job/Solr-main-Linux/3116/consoleText > ** https://jenkins.thetaphi.de/job/Solr-main-Linux/3109/consoleText > ** https://jenkins.thetaphi.de/job/Solr-main-MacOSX/678/consoleText > It looks like it might be trying to shut down: > {code:java} > 2> 31835 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command processor > encountered interrupt and exit. > 2> 31835 WARN (BP-1491942626-127.0.0.1-1645493532707 heartbeating to > localhost/127.0.0.1:51572) [] o.a.h.h.s.d.IncrementalBlockReportManager > IncrementalBlockReportManager interrupted > 2> 31835 WARN (Command processor) [] o.a.h.h.s.d.DataNode Ending command > processor service for: Thread[Command > processor,5,TGRP-HdfsBackupRepositoryIntegrationTest] > {code} > the error message is: > {code:java} > 2> SEVERE: 1 thread leaked from SUITE scope at > org.apache.solr.handler.TestHdfsBackupRestoreCore: > 2> 1) Thread[id=874, name=Command processor, state=WAITING, > group=TGRP-TestHdfsBackupRestoreCore] > 2> at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native Method) > 2> at > java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) > 2> at > java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505) > 2> at > java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137) > 2> at > java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614) > 2> at > java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) > 2> at > app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331) > 2> at > app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315) > 2> Feb 22, 2022 12:51:04 AM > com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll > 2> INFO: Starting to interrupt leaked threads: > 2> 1) Thread[id=874, name=Command processor, state=WAITING, > group=TGRP-TestHdfsBackupRestoreCore] > 2> 46271 ERROR (Command processor) [] o.a.h.h.s.d.DataNode Command > processor encountered interrupt and exit. > 2> 46272 WARN (Command processor) [] o.a.h.h.s.d.DataNode Ending command > processor service for: Thread[Command > processor,5,TGRP-TestHdfsBackupRestoreCore] > 2> Feb 22, 2022 12:51:04 AM > com.carrotsearch.randomizedtesting.ThreadLeakControl tryToInterruptAll > 2> INFO: All leaked threads terminated. > > com.carrotsearch.randomizedtesting.ThreadLeakError: 1 thread leaked > from SUITE scope at org.apache.solr.handler.TestHdfsBackupRestoreCore: > > 1) Thread[id=874, name=Command processor, state=WAITING, > group=TGRP-TestHdfsBackupRestoreCore] > > at java.base@16.0.2/jdk.internal.misc.Unsafe.park(Native > Method) > > at > java.base@16.0.2/java.util.concurrent.locks.LockSupport.park(LockSupport.java:341) > > at > java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionNode.block(AbstractQueuedSynchronizer.java:505) > > at > java.base@16.0.2/java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3137) > > at > java.base@16.0.2/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1614) > > at > java.base@16.0.2/java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:435) > > at > app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.processQueue(BPServiceActor.java:1331) > > at > app//org.apache.hadoop.hdfs.server.datanode.BPServiceActor$CommandProcessingThread.run(BPServiceActor.java:1315) > > at __randomizedtesting.SeedInfo.seed([239C064719AD54A3]:0) > 2> NOTE: test params are: codec=Asserting(Lucene90): {_root_=Lucene90, > name=PostingsFormat(name=MockRandom), id=PostingsFormat(name=MockRandom)}, > docValues:{}, maxPointsInLeafNode=2008, maxMBSortInHeap=7.425444076012971, > sim=Asserting(RandomSimilarity(queryNorm=true): {}), locale=bs-Cyrl-BA, > timezone=America/Argentina/San_Luis > 2> NOTE: Linux 5.11.0-41-generic amd64/Eclipse Foundation 16.0.2 > (64-bit)/cpus=16,threads=132,free=90931432,total=259981312 > 2> NOTE: All tests run in this JVM: [HdfsCollectionsAPIDistributedZkTest, > HDFSCollectionsAPITest, HdfsTlogReplayBufferedWhileIndexingTest, > MoveReplicaHDFSTest, TestHdfsBackupRestoreCore] > 2> NOTE: reproduce with: gradlew test --tests TestHdfsBackupRestoreCore > -Dtests.seed=239C064719AD54A3 -Dtests.multiplier=3 -Dtests.slow=true > -Dtests.locale=bs-Cyrl-BA -Dtests.timezone=America/Argentina/San_Luis > -Dtests.asserts=true -Dtests.file.encoding=UTF-8 > {code} > Some related code pointers: > * > https://github.com/apache/hadoop/blame/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/BPServiceActor.java#L1336 > * > https://github.com/apache/hadoop/blame/rel/release-3.3.1/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L2017 > PS: there is an error about datanode mbean - but it is unrelated as far as I > can tell - https://issues.apache.org/jira/browse/HDFS-11041 -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@solr.apache.org For additional commands, e-mail: issues-h...@solr.apache.org