Hello, If you're copying a snapshot part to the new node, then you have to be sure that the /ignite/work/cp, /ignite/wal, /ignite/walarchive directories are empty prior to the node start. Is it true for your case?
On Fri, 27 May 2022 at 10:29, Surinder Mehra <redni...@gmail.com> wrote: > > Hi, > Please find ignite config and error log below > > config : > <property name="gridLogger"> > <bean class="org.apache.ignite.logger.log4j.Log4JLogger"> > <constructor-arg type="java.lang.String" > value="/opt/ignite/apache-ignite/config/ignite-log4j.xml"/> > </bean> > </property> > <property name="peerClassLoadingEnabled" value="true"/> > <property name="deploymentMode" value="CONTINUOUS"/> > <property name="workDirectory" value="/ignite/work"/> > <property name="snapshotPath" value="/ignite/snapshots"/> > <property name="queryThreadPoolSize" value="8"/> > > <property name="dataStorageConfiguration"> > <bean > class="org.apache.ignite.configuration.DataStorageConfiguration"> > <property name="walBufferSize" value="#{128L * 1024 * 1024}"/> > <property name="walSegmentSize" value="#{512L * 1024 * > 1024}"/> > <property name="maxWalArchiveSize" value="#{2L * 1024 * 1024 > * 1024}"/> > <property name="checkpointFrequency" value="#{60 * 1000}" /> > <property name="writeThrottlingEnabled" value="true"/> > <property name="defaultDataRegionConfiguration"> > <bean > class="org.apache.ignite.configuration.DataRegionConfiguration"> > <property name="persistenceEnabled" value="true"/> > <property name="initialSize" value="#{100L * 1024 * > 1024}"/> > <property name="maxSize" value="#{2L * 1024 * 1024 * > 1024}"/> > > <!--https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size--> > <property name="checkpointPageBufferSize" > value="#{512L * 1024 * 1024}"/> > <!--<property name="pageReplacementMode" > value="SEGMENTED_LRU"/>--> > </bean> > </property> > <property name="walPath" value="/ignite/wal"/> > <property name="walArchivePath" value="/ignite/walarchive"/> > </bean> > </property> > > > Error log: > > at > org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2763) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:870) > at > org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3200) > at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1116) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1799) > at > org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1721) > at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1160) > at > org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1054) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:940) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:839) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:709) > at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678) > at org.apache.ignite.Ignition.start(Ignition.java:353) > ... 1 more > Failed to start grid: WAL history is too short [descs=[FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000060.wal, > idx=60], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000061.wal, > idx=61], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000062.wal, > idx=62], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000063.wal, > idx=63], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000064.wal, > idx=64], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000065.wal, > idx=65], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000066.wal, > idx=66], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000067.wal, > idx=67], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000068.wal, > idx=68], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000069.wal, > idx=69], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000070.wal, > idx=70], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000071.wal, > idx=71], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000072.wal, > idx=72], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000073.wal, > idx=73], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000074.wal, > idx=74], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000075.wal, > idx=75], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000076.wal, > idx=76], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000077.wal, > idx=77], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000078.wal, > idx=78], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000079.wal, > idx=79], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000080.wal, > idx=80], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000081.wal, > idx=81], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000082.wal, > idx=82], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000083.wal, > idx=83], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000084.wal, > idx=84], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000085.wal, > idx=85], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000086.wal, > idx=86], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000087.wal, > idx=87], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000088.wal, > idx=88], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000089.wal, > idx=89], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000090.wal, > idx=90], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000091.wal, > idx=91], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000092.wal, > idx=92], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000093.wal, > idx=93], FileDescriptor > [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000094.wal, > idx=94]], start=WALPointer [idx=0, fileOff=0, len=0]] > > > On Thu, May 26, 2022 at 8:56 PM Николай Ижиков <nizhi...@apache.org> wrote: >> >> Can you, please, send your config and full log file that contains error >> message. >> >> 26 мая 2022 г., в 17:50, Surinder Mehra <redni...@gmail.com> написал(а): >> >> Hello, >> I upgraded to 2.13.0 and I am able to take sync snapshots now. However, I >> ran into another problem while restoring from snapshot using manual steps >> mentioned in documentation. >> >> We run ignite statefulset on kubernetes cluster so when we scale it to N >> nodes, it brings up one node at a time. >> >> Now I am trying to attach init container which will copy /db directory from >> snapshots to work directory after clearing db directory from work directory >> and then start main container which runs ignite. >> >> It works well on single node, it's able to start cluster with snapshot Data. >> >> When I start multiple nodes, init container will run each one of those as >> first step. Since nodes starts one at a time, it's runs into error saying >> "too small WAL segments data" >> >> I suppose that could be because 2nd node is still in init step while first >> one is in running mode. There are few which haven't started yet, waiting for >> 2nd node to be in running state. >> >> Any idea how can we make main containers wait until all init containers are >> completed >> >> Asking this here as its related to ignite setup in kubernetes. >> >> Any help wil be appreciated. Thanks >> >> On Wed, 25 May 2022, 00:04 Surinder Mehra, <redni...@gmail.com> wrote: >>> >>> Thanks a lot. I will try this. >>> >>> On Tue, 24 May 2022, 23:50 Николай Ижиков, <nizhi...@apache.org> wrote: >>>> >>>> > Does it ensure consistency while copying data which is parallely getting >>>> > updated by application writes >>>> >>>> Yes. >>>> >>>> From the documentation: >>>> >>>> «An Ignite snapshot includes a consistent cluster-wide copy of all data >>>> records persisted on disk and some other files needed for a restore >>>> procedure.» >>>> >>>> > will this be a stop the world process >>>> >>>> No. >>>> >>>> >>>> 24 мая 2022 г., в 21:17, Surinder Mehra <redni...@gmail.com> написал(а): >>>> >>>> Hi >>>> Thanks for reply. >>>> >>>> #1: So it's not a stop the world task. Does it ensure consistency while >>>> copying data which is parallely getting updated by application writes. Or >>>> does it mark the data to copied and ignore further updates on it. >>>> >>>> #2: >>>> I will try sync snapshot. But just to confirm, will this be a stop the >>>> world process. Couldn't find anything on Documentation page about it >>>> >>>> On Tue, 24 May 2022, 23:12 Николай Ижиков, <nizhi...@apache.org> wrote: >>>>> >>>>> Hello, Mehra. >>>>> >>>>> > 1. Is it stop the world process. >>>>> >>>>> No, you can perform any actions. >>>>> Note, topology changes will cancel snapshot create process. >>>>> >>>>> > 2. If so, is it stop the world only during command execution >>>>> > (500millis) or until snapshot Dara is fully copied(takes many minutes) >>>>> > to complete. >>>>> >>>>> Please, take a look at `—sync` option of create snapshot command (you can >>>>> see help in `control.sh` output). >>>>> `EVT_CLUSTER_SNAPSHOT_FINISHED` raise on snapshot create finish. >>>>> >>>>> > 3. Is there a way around to speed up this other than increasing >>>>> > snapshot threads >>>>> >>>>> Stop write operations. >>>>> The less you change the quicker snapshot will be created. >>>>> >>>>> 24 мая 2022 г., в 20:12, Surinder Mehra <redni...@gmail.com> написал(а): >>>>> >>>>> Hi, >>>>> I have 3 node ignite cluster each node contains 60G work directory(ebs) >>>>> and I need to create snapshots. >>>>> I followed steps to create snapshots and run create snapshot command >>>>> using control utility. Command completed in 500millis but snapshot >>>>> directory only had 400Mb data. Later I realised directory size grew up >>>>> 30G. I suppose it would reach size of work directory. >>>>> >>>>> >>>>> I have few questions. >>>>> 1. Is it stop the world process. >>>>> 2. If so, is it stop the world only during command execution (500millis) >>>>> or until snapshot Dara is fully copied(takes many minutes) to complete. >>>>> 3. Is there a way around to speed up this other than increasing snapshot >>>>> threads >>>>> >>>>> >>>> >>