Dmitriy Pavlov created IGNITE-7507: -------------------------------------- Summary: Ignite node performance drop during checkpoint start: store metapage eviction causes long checkpoint lock hold time Key: IGNITE-7507 URL: https://issues.apache.org/jira/browse/IGNITE-7507 Project: Ignite Issue Type: Bug Components: persistence Reporter: Dmitriy Pavlov Assignee: Dmitriy Pavlov Fix For: 2.5
Store metadata Page eviction becomes very expensive operation during checkpoint start. These pages reads hands ignite node until metadata will be loaded from disk. Following store (paritition) metapages: - Partition Metadata Page - Freelist Meta Page - Partition Counters IO required during execution of saveStoreMetadata() & markCheckpointBegin() If this page is not available in memory, it is loaded from disk. But such loads are done while holding checkpointLock (in write mode). Example of timing: - checkpointLockWait=75ms, checkpointLockHoldTime=2653ms, pages=696120 All this time worker threads are not able to put any data to any cache. It is required to avoid eviction of such pages (evict it with lowest priority than dirty page). (Full stacktrace) {noformat} db-checkpoint-thread-#40%checkpoint.IgniteMassLoadSandboxTest1% Id=63 WAITING at sun.misc.Unsafe.park(Native Method) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:304) at org.apache.ignite.internal.util.future.GridFutureAdapter.get0(GridFutureAdapter.java:177) at org.apache.ignite.internal.util.future.GridFutureAdapter.getUninterruptibly(GridFutureAdapter.java:145) at org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIO.read(AsyncFileIO.java:95) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStore.read(FilePageStore.java:324) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:306) at org.apache.ignite.internal.processors.cache.persistence.file.FilePageStoreManager.read(FilePageStoreManager.java:291) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:656) at org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.acquirePage(PageMemoryImpl.java:576) at org.apache.ignite.internal.processors.cache.persistence.DataStructure.acquirePage(DataStructure.java:130) at org.apache.ignite.internal.processors.cache.persistence.freelist.PagesList.saveMetadata(PagesList.java:301) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.saveStoreMetadata(GridCacheOffheapManager.java:196) at org.apache.ignite.internal.processors.cache.persistence.GridCacheOffheapManager.onCheckpointBegin(GridCacheOffheapManager.java:168) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.markCheckpointBegin(GridCacheDatabaseSharedManager.java:3022) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.doCheckpoint(GridCacheDatabaseSharedManager.java:2719) at org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager$Checkpointer.body(GridCacheDatabaseSharedManager.java:2644) at org.apache.ignite.internal.util.worker.GridWorker.run(GridWorker.java:110) at java.lang.Thread.run(Thread.java:748) {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)