Your checkpointing is writing ~40k pages to store using 25 seconds. Assuming default 4Kb pages this means writing about 160Mb in 25 seconds, or less than 7Mb/sec. On an SSD. Which seems slow.
But... Your SSD volumes are really small, so perhaps you are partitioning a single large SSD device into 25Gb slices? If so, then that SSD is writing ~160Mb/second along with all kinds of overhead for lots of networked access to it. If you are also using the same partitioned SSD device for the data, WAL & archive then that would compound things. If this is not the case, and there is a dedicated physical SSD for the data store on each server, then 7Mb/sec sounds appalling! Raymond. On Mon, Apr 18, 2022 at 12:48 PM Ilya Korol <llivezk...@gmail.com> wrote: > 1. We have DirectIO disabled : does enabling it impact the performance > if enabled? > It should increase performance, but its always worth to do some > benchmarks before using such features in production. > > 2. When we disabled throttling, we saw 5x better performance. Struggling > load test completed in 1/5th of time. What are its side effects if we > keep it disabled. > You can safely disable it. In this case throttling will still be > present, but it would use less intelligent strategies (that also from > time to time may work incorrectly) > > 3. Does MaxMemoryDirectSize have any relation to throughput rate. > I don't know anything regarding this. > > From my perspective your checkpointBuffSize is enough, take a look to > your log messages: cpBufUsed=133, cpBufTotal=1554645 > > Increasing checkpoint frequency should spread IO pressure more evenly > over time, but as I mentioned before if you decide to increase/decrease > any IO parameter you would be better to benchmark how it would impact > your setup. > > On 2022/04/17 05:43:57 Surinder Mehra wrote: > > Hey thanks for replying. we haven't configured the storage path so by > > default it should be in the work directory. work, wal, walarchive all > three > > are SSDs. I have the following queries. > > > > 1. We have DirectIO disabled : does enabling it impact the performance > if > > enabled? > > 2. When we disabled throttling, we saw 5x better performance. Struggling > > load test completed in 1/5th of time. What are its side effects if we > keep > > it disabled > > 3. Does MaxMemoryDirectSize have any relation to throughput rate. > > 4. Can the current configuration mentioned in the previous thread be > scaled > > further? like increasing WalSegment size beyond 1GB and related size of > > walArchive, checkpointbufferSize and MaxMemoryDirectSize jvm parameter. > > 5. We see now due to throttling disabled, WalArchive size is going > beyond > > 50G(WalSegment size 1G and checkpoint buffer size 6G). would decreasing > > checkpoint frequency and/or increasing checkpoint threads count increase > > throughput or impact application writes inversely. Currently > checkpointing > > frequency and threads are default > > > > > > On Sun, Apr 17, 2022 at 6:33 AM Ilya Korol <ll...@gmail.com> wrote: > > > > > Hi, From my perspective this looks like your physical storage is not > > > fast enough to handle incoming writes. markDirty speed is 2x times > > > faster that checkpointWrite even in the presence of throttling. You've > > > mentioned that ignite work folder stored on SSD, but what about PDS > > > folder (DataStorageConfiguration.setStoragePath())? > > > > > > Btw, have you tested your setup with DirectIO disabled? > > > > > > On 2022/04/14 10:55:23 Surinder Mehra wrote: > > > > Hi, > > > > We have an application with ignite thick clients which writes to > ignite > > > > caches on ignite grid deployed separately. Below is the ignite > > > > configuration per node > > > > > > > > With this configuration, we see throttling happening and > > > checkpointing time > > > > is between 20-30 seconds. Did we miss something in configuration > or any > > > > other settings we can enable. Any suggestions will be of great help. > > > > > > > > * 100-200 concurrent writes to 25 node cluster > > > > * #partitions 512 > > > > * cache backups = 2 > > > > * cache mode partitioned > > > > * syncronizationMode : primary Sync > > > > * Off Heap caches > > > > * Server nodes : 25 > > > > * RAM : 64G > > > > * maxmemoryDirectSize : 4G > > > > * Heap: 25G > > > > > > > > * persistenceEnabled: true > > > > * data region size : 24GB > > > > * checkPointingBufferSize: 6gb > > > > * walSegmentSize: 1G > > > > * walBufferSize : 256MB > > > > * walarchiveSize: 24G > > > > * writeThrotlingEnabled: true > > > > * checkPointingfreq : 60 sec > > > > * checkPointingThreads: 4 > > > > * DirectIO enabled: true > > > > > > > > SSDs atatched: > > > > work volume : 20G > > > > wal volume : 15G > > > > Wal archive volume : 26G > > > > > > > > > > > > Checkpointing logs: > > > > > > > > [10:27:13,237][INFO][db-checkpoint-thread-#230][Checkpointer] > Checkpoint > > > > started [checkpointId=11749dc0-fd0d-4b5f-8b9a-510e774fec38, > > > > startPtr=WALPointer [idx=26, fileOff=385214751, len=16683], > > > > checkpointBeforeLockTime=29ms, checkpointLockWait=0ms, > > > > checkpointListenersExecuteTime=2ms, checkpointLockHoldTime=3ms, > > > > walCpRecordFsyncDuration=11ms, writeCheckpointEntryDuration=3ms, > > > > splitAndSortCpPagesDuration=30ms, pages=40505, reason='timeout'] > > > > [10:27:13,242][INFO][sys-stripe-7-#8][PageMemoryImpl] Throttling is > > > applied > > > > to page modifications [percentOfPartTime=0.88, markDirty=2121 > pages/sec, > > > > checkpointWrite=1219 pages/sec, estIdealMarkDirty=0 pages/sec, > > > > curDirty=0.00, maxDirty=0.02, avgParkTime=410172 ns, pages: > > > (total=40505, > > > > evicted=0, written=10, synced=0, cpBufUsed=133, cpBufTotal=1554645)] > > > > [10:27:29,935][INFO][grid-timeout-worker-#30][IgniteKernal] > > > > Metrics for local node (to disable set 'metricsLogFrequency' to 0) > > > > ^-- Node [id=214f3c2b, uptime=00:45:00.227] > > > > ^-- Cluster [hosts=45, CPUs=540, servers=25, clients=20, topVer=75, > > > > minorTopVer=0] > > > > ^-- Network [addrs=[127.0.0.1, 192.168.98.141], discoPort=47500, > > > > commPort=47100] > > > > ^-- CPU [CPUs=12, curLoad=3.67%, avgLoad=0.82%, GC=0%] > > > > ^-- Heap [used=5330MB, free=79.18%, comm=20480MB] > > > > ^-- Off-heap memory [used=1019MB, free=95.92%, allocated=24775MB] > > > > ^-- Page memory [pages=257976] > > > > ^-- sysMemPlc region [type=internal, persistence=true, > > > > lazyAlloc=false, > > > > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%, > > > > allocRam=99MB, allocTotal=0MB] > > > > ^-- default region [type=default, persistence=true, lazyAlloc=true, > > > > ... initCfg=24576MB, maxCfg=24576MB, usedRam=1018MB, freeRam=95.86%, > > > > allocRam=24576MB, allocTotal=3820MB] > > > > ^-- metastoreMemPlc region [type=internal, persistence=true, > > > > lazyAlloc=false, > > > > ... initCfg=40MB, maxCfg=100MB, usedRam=1MB, freeRam=98.78%, > > > > allocRam=0MB, allocTotal=1MB] > > > > ^-- TxLog region [type=internal, persistence=true, lazyAlloc=false, > > > > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, > > > > allocRam=99MB, allocTotal=0MB] > > > > ^-- volatileDsMemPlc region [type=user, persistence=false, > > > > lazyAlloc=true, > > > > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%, > > > > allocRam=0MB] > > > > ^-- Ignite persistence [used=3821MB] > > > > ^-- Outbound messages queue [size=0] > > > > ^-- Public thread pool [active=0, idle=0, qSize=0] > > > > ^-- System thread pool [active=0, idle=7, qSize=0] > > > > ^-- Striped thread pool [active=0, idle=12, qSize=0] > > > > [10:27:38,261][INFO][db-checkpoint-thread-#230][Checkpointer] > Checkpoint > > > > finished [cpId=11749dc0-fd0d-4b5f-8b9a-510e774fec38, pages=40505, > > > > markPos=WALPointer [idx=26, fileOff=385214751, len=16683], > > > > walSegmentsCovered=[], markDuration=47ms, pagesWrite=25018ms, > fsync=6ms, > > > > total=25100ms] > > > > > > > > > > -- <http://www.trimble.com/> Raymond Wilson Trimble Distinguished Engineer, Civil Construction Software (CCS) 11 Birmingham Drive | Christchurch, New Zealand raymond_wil...@trimble.com <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>