Hi, From my perspective this looks like your physical storage is not fast enough to handle incoming writes. markDirty speed is 2x times faster that checkpointWrite even in the presence of throttling. You've mentioned that ignite work folder stored on SSD, but what about PDS folder (DataStorageConfiguration.setStoragePath())?

Btw, have you tested your setup with DirectIO disabled?

On 2022/04/14 10:55:23 Surinder Mehra wrote:
> Hi,
> We have an application with ignite thick clients which writes to ignite
> caches on ignite grid deployed separately. Below is the ignite
> configuration per node
>
> With this configuration, we see throttling happening and checkpointing time
> is between 20-30 seconds. Did we miss something in configuration or any
> other settings we can enable. Any suggestions will be of great help.
>
> * 100-200 concurrent writes to 25 node cluster
> * #partitions 512
> * cache backups = 2
> * cache mode partitioned
> * syncronizationMode : primary Sync
> * Off Heap caches
> * Server nodes : 25
> * RAM : 64G
> * maxmemoryDirectSize : 4G
> * Heap: 25G
>
> * persistenceEnabled: true
> * data region size : 24GB
> * checkPointingBufferSize: 6gb
> * walSegmentSize: 1G
> * walBufferSize : 256MB
> * walarchiveSize: 24G
> * writeThrotlingEnabled: true
> * checkPointingfreq : 60 sec
> * checkPointingThreads: 4
> * DirectIO enabled: true
>
> SSDs atatched:
> work volume : 20G
> wal volume : 15G
> Wal archive volume : 26G
>
>
> Checkpointing logs:
>
> [10:27:13,237][INFO][db-checkpoint-thread-#230][Checkpointer] Checkpoint
> started [checkpointId=11749dc0-fd0d-4b5f-8b9a-510e774fec38,
> startPtr=WALPointer [idx=26, fileOff=385214751, len=16683],
> checkpointBeforeLockTime=29ms, checkpointLockWait=0ms,
> checkpointListenersExecuteTime=2ms, checkpointLockHoldTime=3ms,
> walCpRecordFsyncDuration=11ms, writeCheckpointEntryDuration=3ms,
> splitAndSortCpPagesDuration=30ms, pages=40505, reason='timeout']
> [10:27:13,242][INFO][sys-stripe-7-#8][PageMemoryImpl] Throttling is applied
> to page modifications [percentOfPartTime=0.88, markDirty=2121 pages/sec,
> checkpointWrite=1219 pages/sec, estIdealMarkDirty=0 pages/sec,
> curDirty=0.00, maxDirty=0.02, avgParkTime=410172 ns, pages: (total=40505,
> evicted=0, written=10, synced=0, cpBufUsed=133, cpBufTotal=1554645)]
> [10:27:29,935][INFO][grid-timeout-worker-#30][IgniteKernal]
> Metrics for local node (to disable set 'metricsLogFrequency' to 0)
> ^-- Node [id=214f3c2b, uptime=00:45:00.227]
> ^-- Cluster [hosts=45, CPUs=540, servers=25, clients=20, topVer=75,
> minorTopVer=0]
> ^-- Network [addrs=[127.0.0.1, 192.168.98.141], discoPort=47500,
> commPort=47100]
> ^-- CPU [CPUs=12, curLoad=3.67%, avgLoad=0.82%, GC=0%]
> ^-- Heap [used=5330MB, free=79.18%, comm=20480MB]
> ^-- Off-heap memory [used=1019MB, free=95.92%, allocated=24775MB]
> ^-- Page memory [pages=257976]
> ^-- sysMemPlc region [type=internal, persistence=true,
> lazyAlloc=false,
> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%,
> allocRam=99MB, allocTotal=0MB]
> ^-- default region [type=default, persistence=true, lazyAlloc=true,
> ... initCfg=24576MB, maxCfg=24576MB, usedRam=1018MB, freeRam=95.86%,
> allocRam=24576MB, allocTotal=3820MB]
> ^-- metastoreMemPlc region [type=internal, persistence=true,
> lazyAlloc=false,
> ... initCfg=40MB, maxCfg=100MB, usedRam=1MB, freeRam=98.78%,
> allocRam=0MB, allocTotal=1MB]
> ^-- TxLog region [type=internal, persistence=true, lazyAlloc=false,
> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
> allocRam=99MB, allocTotal=0MB]
> ^-- volatileDsMemPlc region [type=user, persistence=false,
> lazyAlloc=true,
> ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
> allocRam=0MB]
> ^-- Ignite persistence [used=3821MB]
> ^-- Outbound messages queue [size=0]
> ^-- Public thread pool [active=0, idle=0, qSize=0]
> ^-- System thread pool [active=0, idle=7, qSize=0]
> ^-- Striped thread pool [active=0, idle=12, qSize=0]
> [10:27:38,261][INFO][db-checkpoint-thread-#230][Checkpointer] Checkpoint
> finished [cpId=11749dc0-fd0d-4b5f-8b9a-510e774fec38, pages=40505,
> markPos=WALPointer [idx=26, fileOff=385214751, len=16683],
> walSegmentsCovered=[], markDuration=47ms, pagesWrite=25018ms, fsync=6ms,
> total=25100ms]
>

Reply via email to