Your checkpointing is writing ~40k pages to store using 25 seconds.
Assuming default 4Kb pages this means writing about 160Mb in 25 seconds, or
less than 7Mb/sec. On an SSD. Which seems slow.

But... Your SSD volumes are really small, so perhaps you are partitioning a
single large SSD device into 25Gb slices? If so, then that SSD is writing
~160Mb/second along with all kinds of overhead for lots of networked access
to it. If you are also using the same partitioned SSD device for the data,
WAL & archive then that would compound things.

If this is not the case, and there is a dedicated physical SSD for the data
store on each server, then 7Mb/sec sounds appalling!

Raymond.

On Mon, Apr 18, 2022 at 12:48 PM Ilya Korol <llivezk...@gmail.com> wrote:

> 1. We have DirectIO disabled : does enabling it impact the performance
> if enabled?
> It should increase performance, but its always worth to do some
> benchmarks before using such features in production.
>
> 2. When we disabled throttling, we saw 5x better performance. Struggling
> load test completed in 1/5th of time. What are its side effects if we
> keep it disabled.
> You can safely disable it. In this case throttling will still be
> present, but it would use less intelligent strategies (that also from
> time to time may work incorrectly)
>
> 3. Does MaxMemoryDirectSize have any relation to throughput rate.
> I don't know anything regarding this.
>
>  From my perspective your checkpointBuffSize is enough, take a look to
> your log messages: cpBufUsed=133, cpBufTotal=1554645
>
> Increasing checkpoint frequency should spread IO pressure more evenly
> over time, but as I mentioned before if you decide to increase/decrease
> any IO parameter you would be better to benchmark how it would impact
> your setup.
>
> On 2022/04/17 05:43:57 Surinder Mehra wrote:
>  > Hey thanks for replying. we haven't configured the storage path so by
>  > default it should be in the work directory. work, wal, walarchive all
> three
>  > are SSDs. I have the following queries.
>  >
>  > 1. We have DirectIO disabled : does enabling it impact the performance
> if
>  > enabled?
>  > 2. When we disabled throttling, we saw 5x better performance. Struggling
>  > load test completed in 1/5th of time. What are its side effects if we
> keep
>  > it disabled
>  > 3. Does MaxMemoryDirectSize have any relation to throughput rate.
>  > 4. Can the current configuration mentioned in the previous thread be
> scaled
>  > further? like increasing WalSegment size beyond 1GB and related size of
>  > walArchive, checkpointbufferSize and MaxMemoryDirectSize jvm parameter.
>  > 5. We see now due to throttling disabled, WalArchive size is going
> beyond
>  > 50G(WalSegment size 1G and checkpoint buffer size 6G). would decreasing
>  > checkpoint frequency and/or increasing checkpoint threads count increase
>  > throughput or impact application writes inversely. Currently
> checkpointing
>  > frequency and threads are default
>  >
>  >
>  > On Sun, Apr 17, 2022 at 6:33 AM Ilya Korol <ll...@gmail.com> wrote:
>  >
>  > > Hi, From my perspective this looks like your physical storage is not
>  > > fast enough to handle incoming writes. markDirty speed is 2x times
>  > > faster that checkpointWrite even in the presence of throttling. You've
>  > > mentioned that ignite work folder stored on SSD, but what about PDS
>  > > folder (DataStorageConfiguration.setStoragePath())?
>  > >
>  > > Btw, have you tested your setup with DirectIO disabled?
>  > >
>  > > On 2022/04/14 10:55:23 Surinder Mehra wrote:
>  > > > Hi,
>  > > > We have an application with ignite thick clients which writes to
> ignite
>  > > > caches on ignite grid deployed separately. Below is the ignite
>  > > > configuration per node
>  > > >
>  > > > With this configuration, we see throttling happening and
>  > > checkpointing time
>  > > > is between 20-30 seconds. Did we miss something in configuration
> or any
>  > > > other settings we can enable. Any suggestions will be of great help.
>  > > >
>  > > > * 100-200 concurrent writes to 25 node cluster
>  > > > * #partitions 512
>  > > > * cache backups = 2
>  > > > * cache mode partitioned
>  > > > * syncronizationMode : primary Sync
>  > > > * Off Heap caches
>  > > > * Server nodes : 25
>  > > > * RAM : 64G
>  > > > * maxmemoryDirectSize : 4G
>  > > > * Heap: 25G
>  > > >
>  > > > * persistenceEnabled: true
>  > > > * data region size : 24GB
>  > > > * checkPointingBufferSize: 6gb
>  > > > * walSegmentSize: 1G
>  > > > * walBufferSize : 256MB
>  > > > * walarchiveSize: 24G
>  > > > * writeThrotlingEnabled: true
>  > > > * checkPointingfreq : 60 sec
>  > > > * checkPointingThreads: 4
>  > > > * DirectIO enabled: true
>  > > >
>  > > > SSDs atatched:
>  > > > work volume : 20G
>  > > > wal volume : 15G
>  > > > Wal archive volume : 26G
>  > > >
>  > > >
>  > > > Checkpointing logs:
>  > > >
>  > > > [10:27:13,237][INFO][db-checkpoint-thread-#230][Checkpointer]
> Checkpoint
>  > > > started [checkpointId=11749dc0-fd0d-4b5f-8b9a-510e774fec38,
>  > > > startPtr=WALPointer [idx=26, fileOff=385214751, len=16683],
>  > > > checkpointBeforeLockTime=29ms, checkpointLockWait=0ms,
>  > > > checkpointListenersExecuteTime=2ms, checkpointLockHoldTime=3ms,
>  > > > walCpRecordFsyncDuration=11ms, writeCheckpointEntryDuration=3ms,
>  > > > splitAndSortCpPagesDuration=30ms, pages=40505, reason='timeout']
>  > > > [10:27:13,242][INFO][sys-stripe-7-#8][PageMemoryImpl] Throttling is
>  > > applied
>  > > > to page modifications [percentOfPartTime=0.88, markDirty=2121
> pages/sec,
>  > > > checkpointWrite=1219 pages/sec, estIdealMarkDirty=0 pages/sec,
>  > > > curDirty=0.00, maxDirty=0.02, avgParkTime=410172 ns, pages:
>  > > (total=40505,
>  > > > evicted=0, written=10, synced=0, cpBufUsed=133, cpBufTotal=1554645)]
>  > > > [10:27:29,935][INFO][grid-timeout-worker-#30][IgniteKernal]
>  > > > Metrics for local node (to disable set 'metricsLogFrequency' to 0)
>  > > > ^-- Node [id=214f3c2b, uptime=00:45:00.227]
>  > > > ^-- Cluster [hosts=45, CPUs=540, servers=25, clients=20, topVer=75,
>  > > > minorTopVer=0]
>  > > > ^-- Network [addrs=[127.0.0.1, 192.168.98.141], discoPort=47500,
>  > > > commPort=47100]
>  > > > ^-- CPU [CPUs=12, curLoad=3.67%, avgLoad=0.82%, GC=0%]
>  > > > ^-- Heap [used=5330MB, free=79.18%, comm=20480MB]
>  > > > ^-- Off-heap memory [used=1019MB, free=95.92%, allocated=24775MB]
>  > > > ^-- Page memory [pages=257976]
>  > > > ^-- sysMemPlc region [type=internal, persistence=true,
>  > > > lazyAlloc=false,
>  > > > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=99.99%,
>  > > > allocRam=99MB, allocTotal=0MB]
>  > > > ^-- default region [type=default, persistence=true, lazyAlloc=true,
>  > > > ... initCfg=24576MB, maxCfg=24576MB, usedRam=1018MB, freeRam=95.86%,
>  > > > allocRam=24576MB, allocTotal=3820MB]
>  > > > ^-- metastoreMemPlc region [type=internal, persistence=true,
>  > > > lazyAlloc=false,
>  > > > ... initCfg=40MB, maxCfg=100MB, usedRam=1MB, freeRam=98.78%,
>  > > > allocRam=0MB, allocTotal=1MB]
>  > > > ^-- TxLog region [type=internal, persistence=true, lazyAlloc=false,
>  > > > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
>  > > > allocRam=99MB, allocTotal=0MB]
>  > > > ^-- volatileDsMemPlc region [type=user, persistence=false,
>  > > > lazyAlloc=true,
>  > > > ... initCfg=40MB, maxCfg=100MB, usedRam=0MB, freeRam=100%,
>  > > > allocRam=0MB]
>  > > > ^-- Ignite persistence [used=3821MB]
>  > > > ^-- Outbound messages queue [size=0]
>  > > > ^-- Public thread pool [active=0, idle=0, qSize=0]
>  > > > ^-- System thread pool [active=0, idle=7, qSize=0]
>  > > > ^-- Striped thread pool [active=0, idle=12, qSize=0]
>  > > > [10:27:38,261][INFO][db-checkpoint-thread-#230][Checkpointer]
> Checkpoint
>  > > > finished [cpId=11749dc0-fd0d-4b5f-8b9a-510e774fec38, pages=40505,
>  > > > markPos=WALPointer [idx=26, fileOff=385214751, len=16683],
>  > > > walSegmentsCovered=[], markDuration=47ms, pagesWrite=25018ms,
> fsync=6ms,
>  > > > total=25100ms]
>  > > >
>  > >
>  >
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Trimble Distinguished Engineer, Civil Construction Software (CCS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to