Hello,

If you're copying a snapshot part to the new node, then you have to be
sure that the /ignite/work/cp, /ignite/wal, /ignite/walarchive
directories are empty prior to the node start. Is it true for your
case?

On Fri, 27 May 2022 at 10:29, Surinder Mehra <redni...@gmail.com> wrote:
>
> Hi,
> Please find ignite config and error log below
>
> config :
> <property name="gridLogger">
>             <bean class="org.apache.ignite.logger.log4j.Log4JLogger">
>                 <constructor-arg type="java.lang.String" 
> value="/opt/ignite/apache-ignite/config/ignite-log4j.xml"/>
>             </bean>
>         </property>
>         <property name="peerClassLoadingEnabled" value="true"/>
>         <property name="deploymentMode" value="CONTINUOUS"/>
>         <property name="workDirectory" value="/ignite/work"/>
>         <property name="snapshotPath" value="/ignite/snapshots"/>
>         <property name="queryThreadPoolSize" value="8"/>
>
>         <property name="dataStorageConfiguration">
>             <bean 
> class="org.apache.ignite.configuration.DataStorageConfiguration">
>                 <property name="walBufferSize" value="#{128L * 1024 * 1024}"/>
>                 <property name="walSegmentSize" value="#{512L * 1024 * 
> 1024}"/>
>                 <property name="maxWalArchiveSize" value="#{2L * 1024 * 1024 
> * 1024}"/>
>                 <property name="checkpointFrequency" value="#{60 * 1000}" />
>                 <property name="writeThrottlingEnabled" value="true"/>
>                 <property name="defaultDataRegionConfiguration">
>                     <bean 
> class="org.apache.ignite.configuration.DataRegionConfiguration">
>                         <property name="persistenceEnabled" value="true"/>
>                         <property name="initialSize" value="#{100L * 1024 * 
> 1024}"/>
>                         <property name="maxSize" value="#{2L * 1024 * 1024 * 
> 1024}"/>
>                         
> <!--https://ignite.apache.org/docs/latest/persistence/persistence-tuning#adjusting-checkpointing-buffer-size-->
>                         <property name="checkpointPageBufferSize" 
> value="#{512L * 1024 * 1024}"/>
>                         <!--<property name="pageReplacementMode" 
> value="SEGMENTED_LRU"/>-->
>                     </bean>
>                 </property>
>                 <property name="walPath" value="/ignite/wal"/>
>                 <property name="walArchivePath" value="/ignite/walarchive"/>
>             </bean>
>         </property>
>
>
> Error log:
>
> at 
> org.apache.ignite.internal.processors.cache.persistence.wal.FileWriteAheadLogManager$RecordsIterator.access$1000(FileWriteAheadLogManager.java:2763)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.readMetastore(GridCacheDatabaseSharedManager.java:870)
> at 
> org.apache.ignite.internal.processors.cache.persistence.GridCacheDatabaseSharedManager.notifyMetaStorageSubscribersOnReadyForRead(GridCacheDatabaseSharedManager.java:3200)
> at org.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1116)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:1799)
> at 
> org.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1721)
> at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1160)
> at 
> org.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1054)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:940)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:839)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:709)
> at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:678)
> at org.apache.ignite.Ignition.start(Ignition.java:353)
> ... 1 more
> Failed to start grid: WAL history is too short [descs=[FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000060.wal,
>  idx=60], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000061.wal,
>  idx=61], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000062.wal,
>  idx=62], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000063.wal,
>  idx=63], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000064.wal,
>  idx=64], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000065.wal,
>  idx=65], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000066.wal,
>  idx=66], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000067.wal,
>  idx=67], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000068.wal,
>  idx=68], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000069.wal,
>  idx=69], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000070.wal,
>  idx=70], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000071.wal,
>  idx=71], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000072.wal,
>  idx=72], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000073.wal,
>  idx=73], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000074.wal,
>  idx=74], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000075.wal,
>  idx=75], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000076.wal,
>  idx=76], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000077.wal,
>  idx=77], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000078.wal,
>  idx=78], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000079.wal,
>  idx=79], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000080.wal,
>  idx=80], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000081.wal,
>  idx=81], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000082.wal,
>  idx=82], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000083.wal,
>  idx=83], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000084.wal,
>  idx=84], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000085.wal,
>  idx=85], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000086.wal,
>  idx=86], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000087.wal,
>  idx=87], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000088.wal,
>  idx=88], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000089.wal,
>  idx=89], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000090.wal,
>  idx=90], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000091.wal,
>  idx=91], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000092.wal,
>  idx=92], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000093.wal,
>  idx=93], FileDescriptor 
> [file=/ignite/walarchive/node00-44a0ade8-60c2-4190-aac3-7fb465129efe/0000000000000094.wal,
>  idx=94]], start=WALPointer [idx=0, fileOff=0, len=0]]
>
>
> On Thu, May 26, 2022 at 8:56 PM Николай Ижиков <nizhi...@apache.org> wrote:
>>
>> Can you, please, send your config and full log file that contains error 
>> message.
>>
>> 26 мая 2022 г., в 17:50, Surinder Mehra <redni...@gmail.com> написал(а):
>>
>> Hello,
>> I upgraded to 2.13.0 and I am able to take sync snapshots now. However, I 
>> ran into another problem while restoring from snapshot using manual steps 
>> mentioned in documentation.
>>
>> We run ignite statefulset on kubernetes cluster so when we scale it to N 
>> nodes, it brings up one node at a time.
>>
>> Now I am trying to attach init container which will copy /db directory from 
>> snapshots to work directory after clearing db directory from work directory 
>> and then start main container which runs ignite.
>>
>> It works well on single node, it's able to start cluster with snapshot Data.
>>
>> When I start multiple nodes, init container will run each one of those as 
>> first step. Since nodes starts one at a time, it's runs into error saying 
>> "too small WAL segments data"
>>
>> I suppose that could be because 2nd node is still in init step while first 
>> one is in running mode. There are few which haven't started yet, waiting for 
>> 2nd node to be in running state.
>>
>> Any idea how can we make main containers wait until all init containers are 
>> completed
>>
>> Asking this here as its related to ignite setup in kubernetes.
>>
>> Any help wil be appreciated. Thanks
>>
>> On Wed, 25 May 2022, 00:04 Surinder Mehra, <redni...@gmail.com> wrote:
>>>
>>> Thanks a lot. I will try this.
>>>
>>> On Tue, 24 May 2022, 23:50 Николай Ижиков, <nizhi...@apache.org> wrote:
>>>>
>>>> > Does it ensure consistency while copying data which is parallely getting 
>>>> > updated by application writes
>>>>
>>>> Yes.
>>>>
>>>> From the documentation:
>>>>
>>>> «An Ignite snapshot includes a consistent cluster-wide copy of all data 
>>>> records persisted on disk and some other files needed for a restore 
>>>> procedure.»
>>>>
>>>> > will this be a stop the world process
>>>>
>>>> No.
>>>>
>>>>
>>>> 24 мая 2022 г., в 21:17, Surinder Mehra <redni...@gmail.com> написал(а):
>>>>
>>>> Hi
>>>> Thanks for reply.
>>>>
>>>> #1:  So it's not a stop the world task. Does it ensure consistency while 
>>>> copying data which is parallely getting updated by application writes. Or 
>>>> does it mark the data to copied and ignore further updates on it.
>>>>
>>>> #2:
>>>> I will try sync snapshot. But just to confirm, will this be a stop the 
>>>> world process. Couldn't find anything on Documentation page about it
>>>>
>>>> On Tue, 24 May 2022, 23:12 Николай Ижиков, <nizhi...@apache.org> wrote:
>>>>>
>>>>> Hello, Mehra.
>>>>>
>>>>> > 1. Is it stop the world process.
>>>>>
>>>>> No, you can perform any actions.
>>>>> Note, topology changes will cancel snapshot create process.
>>>>>
>>>>> > 2. If so, is it stop the world only during command execution 
>>>>> > (500millis) or until snapshot Dara is fully copied(takes many minutes) 
>>>>> > to complete.
>>>>>
>>>>> Please, take a look at `—sync` option of create snapshot command (you can 
>>>>> see help in `control.sh` output).
>>>>> `EVT_CLUSTER_SNAPSHOT_FINISHED` raise on snapshot create finish.
>>>>>
>>>>> > 3. Is there a way around to speed up this other than increasing 
>>>>> > snapshot threads
>>>>>
>>>>> Stop write operations.
>>>>> The less you change the quicker snapshot will be created.
>>>>>
>>>>> 24 мая 2022 г., в 20:12, Surinder Mehra <redni...@gmail.com> написал(а):
>>>>>
>>>>> Hi,
>>>>> I have 3 node ignite cluster each node contains 60G work directory(ebs) 
>>>>> and I need to create snapshots.
>>>>> I followed steps to create snapshots and run create snapshot command 
>>>>> using control utility. Command completed in 500millis but snapshot 
>>>>> directory only had 400Mb data. Later I realised directory size grew up 
>>>>> 30G. I suppose it would reach size of work directory.
>>>>>
>>>>>
>>>>> I have few questions.
>>>>> 1. Is it stop the world process.
>>>>> 2. If so, is it stop the world only during command execution (500millis) 
>>>>> or until snapshot Dara is fully copied(takes many minutes) to complete.
>>>>> 3. Is there a way around to speed up this other than increasing snapshot 
>>>>> threads
>>>>>
>>>>>
>>>>
>>

Reply via email to