[ https://issues.apache.org/jira/browse/HDDS-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Anu Engineer resolved HDDS-296. ------------------------------- Resolution: Implemented Fixed via 355,356,357,358... > OMMetadataManagerLock is hold by getPendingDeletionKeys for a full table scan > ----------------------------------------------------------------------------- > > Key: HDDS-296 > URL: https://issues.apache.org/jira/browse/HDDS-296 > Project: Hadoop Distributed Data Store > Issue Type: Bug > Reporter: Elek, Marton > Assignee: Anu Engineer > Priority: Critical > Fix For: 0.2.1 > > Attachments: local.png > > > We identified the problem during freon tests on real clusters. First I saw it > on a kubernetes based pseudo cluster (50 datanode, 1 freon). After a while > the rate of the key allocation was slowed down. (See the attached image). > I could also reproduce the problem with local cluster (I used the > hadoop-dist/target/compose/ozoneperf setup). After the first 1 million keys > the key creation is almost stopped. > With the help of [~nandakumar131] we identified the problem is the lock in > the ozone manager. (We profiled the OM with visual vm and found that the code > is locked for an extremity long time, also checked the rocksdb/rpc metrics > from prometheus and everything else was worked well. > [~nandakumar131] suggested to use Instrumented lock in the OMMetadataManager. > With a custom build we identified that the problem is that the deletion > service holds the OMMetadataManager lock for a full range scan. For 1 million > keys it took about 10 seconds (with my local developer machine + ssd) > {code} > ozoneManager_1 | 2018-07-25 12:45:03 WARN OMMetadataManager:143 - Lock held > time above threshold: lock identifier: OMMetadataManagerLock > lockHeldTimeMs=2648 ms. Suppressed 0 lock warnings. The stack trace is: > java.lang.Thread.getStackTrace(Thread.java:1559) > ozoneManager_1 | > org.apache.hadoop.util.StringUtils.getStackTrace(StringUtils.java:1032) > ozoneManager_1 | > org.apache.hadoop.util.InstrumentedLock.logWarning(InstrumentedLock.java:148) > ozoneManager_1 | > org.apache.hadoop.util.InstrumentedLock.check(InstrumentedLock.java:186) > ozoneManager_1 | > org.apache.hadoop.util.InstrumentedReadLock.unlock(InstrumentedReadLock.java:78) > ozoneManager_1 | > org.apache.hadoop.ozone.om.KeyManagerImpl.getPendingDeletionKeys(KeyManagerImpl.java:506) > ozoneManager_1 | > org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:98) > ozoneManager_1 | > org.apache.hadoop.ozone.om.KeyDeletingService$KeyDeletingTask.call(KeyDeletingService.java:85) > ozoneManager_1 | java.util.concurrent.FutureTask.run(FutureTask.java:266) > ozoneManager_1 | > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ozoneManager_1 | java.util.concurrent.FutureTask.run(FutureTask.java:266) > ozoneManager_1 | > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > ozoneManager_1 | java.util.concurrent.FutureTask.run(FutureTask.java:266) > ozoneManager_1 | > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) > ozoneManager_1 | > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) > ozoneManager_1 | > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > ozoneManager_1 | > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > ozoneManager_1 | java.lang.Thread.run(Thread.java:748) > {code} > I checked it with disabled DeletionService and worked well. > Deletion service should be improved to make it work without long term locking. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org