Thank Zhenya.

Currently we call Ignition.Stop() with the flag to allow jobs to complete.
I assume when using deactivate we don;t need to call that, or is it still a
good idea as a belt and braces shut down for the grid?

Raymond

On Wed, Jan 13, 2021 at 8:28 PM Zhenya Stanilovsky <arzamas...@mail.ru>
wrote:

>
>
>
>
> Hi Zhenya,
>
> Thanks for confirming performing checkpoints more often will help here.
>
> Hi Raymond !
>
>
> I have established this configuration so will experiment with settings
> little.
>
> On a related note, is there any way to automatically trigger a checkpoint,
> for instance as a pre-shutdown activity?
>
>
> If you shutdown your cluster gracefully = with deactivation [1] further
> start will not trigger wal readings.
>
> [1]
> https://www.gridgain.com/docs/latest/administrators-guide/control-script#deactivating-cluster
>
>
> Checkpoints seem to be much faster than the process of applying WAL
> updates.
>
> Raymond.
>
> On Wed, Jan 13, 2021 at 8:07 PM Zhenya Stanilovsky <arzamas...@mail.ru
> <//e.mail.ru/compose/?mailto=mailto%3aarzamas...@mail.ru>> wrote:
>
>
>
>
>
>
> We have noticed that startup time for our server nodes has been slowly
> increasing in time as the amount of data stored in the persistent store
> grows.
>
> This appears to be closely related to recovery of WAL changes that were
> not checkpointed at the time the node was stopped.
>
> After enabling debug logging we see that the WAL file is scanned, and for
> every cache, all partitions in the cache are examined, and if there are any
> uncommitted changes in the WAL file then the partition is updated (I assume
> this requires reading of the partition itself as a part of this process).
>
> We now have ~150Gb of data in our persistent store and we see WAL update
> times between 5-10 minutes to complete, during which the node is
> unavailable.
>
> We use fairly large WAL files (512Mb) and use 10 segments, with WAL
> archiving enabled.
>
> We anticipate data in persistent storage to grow to Terabytes, and if the
> startup time continues to grow as storage grows then this makes deploys and
> restarts difficult.
>
> Until now we have been using the default checkpoint time out of 3 minutes
> which may mean we have significant uncheckpointed data in the WAL files. We
> are moving to 1 minute checkpoint but don't yet know if this improve
> startup times. We also use the default 1024 partitions per cache, though
> some partitions may be large.
>
> Can anyone confirm this is expected behaviour and recommendations for
> resolving it?
>
> Will reducing checking pointing intervals help?
>
>
> yes, it will help. Check
> https://cwiki.apache.org/confluence/display/IGNITE/Ignite+Persistent+Store+-+under+the+hood
>
> Is the entire content of a partition read while applying WAL changes?
>
>
> don`t think so, may be someone else suggest here?
>
> Does anyone else have this issue?
>
> Thanks,
> Raymond.
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
> <http://e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>
>
>
> --
> <http://www.trimble.com/>
> Raymond Wilson
> Solution Architect, Civil Construction Software Systems (CCSS)
> 11 Birmingham Drive | Christchurch, New Zealand
> raymond_wil...@trimble.com
> <//e.mail.ru/compose/?mailto=mailto%3araymond_wil...@trimble.com>
>
>
>
> <https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>
>
>
>
>
>
>


-- 
<http://www.trimble.com/>
Raymond Wilson
Solution Architect, Civil Construction Software Systems (CCSS)
11 Birmingham Drive | Christchurch, New Zealand
raymond_wil...@trimble.com

<https://worksos.trimble.com/?utm_source=Trimble&utm_medium=emailsign&utm_campaign=Launch>

Reply via email to