Re: Abnormal termination of nodes with native persistence enabled

李玉珏 Sun, 13 Jan 2019 23:21:40 -0800

Hi,

The console log is as follows:

But if all nodes are killed, all nodes can start successfully, and thedata is normal. Only after a single node fails, it can not join thecluster when it starts again.


thanks!

-----------------------------log start------------------------------

2019-01-14T10:33:35,438][INFO ][main][IgniteKernal]

>>>    __________  ________________
>>>   /  _/ ___/ |/ /  _/_  __/ __/
>>>  _/ // (7 7    // /  / / / _/
>>> /___/\___/_/|_/___/ /_/ /___/
>>>
>>> ver. 2.6.0#20180710-sha1:669feacc
>>> 2018 Copyright(C) Apache Software Foundation
>>>
>>> Ignite documentation: http://ignite.apache.org

2019-01-14T10:33:35,441][INFO ][main][IgniteKernal] Config URL:file:/opt/ignite/apache-ignite-fabric-2.6.0-bin/config/practice-config.xml2019-01-14T10:33:35,458][INFO ][main][IgniteKernal] IgniteConfiguration[igniteInstanceName=null, pubPoolSize=8, svcPoolSize=8,callbackPoolSize=8, stripedPoolSize=8, sysPoolSize=8, mgmtPoolSize=4, igfsPoolSize=4, dataStreamerPoolSize=8, utilityCachePoolSize=8,utilityCacheKeepAliveTime=60000, p2pPoolSize=2, qryPoolSize=8,igniteHome=/opt/ignite/apache-ignite-fabric-2.6.0-bin,igniteWorkDir=/opt/ignite/apache-ignite-fabric-2.6.0-bin/work,mbeanSrv=com.sun.jmx.mbeanserver.JmxMBeanServer@6f94fa3e,nodeId=142b548a-6480-4c31-9559-7d7b2092175c,marsh=org.apache.ignite.internal.binary.BinaryMarshaller@4e0ae11f,marshLocJobs=false, daemon=false, p2pEnabled=true, netTimeout=5000,sndRetryDelay=1000, sndRetryCnt=3, metricsHistSize=10000,metricsUpdateFreq=2000, metricsExpTime=9223372036854775807,discoSpi=TcpDiscoverySpi [addrRslvr=null, sockTimeout=15000,ackTimeout=60000, marsh=null, reconCnt=10, reconDelay=2000,maxAckTimeout=600000, forceSrvMode=false, clientReconnectDisabled=false,internalLsnr=null], segPlc=STOP, segResolveAttempts=2,waitForSegOnStart=true, allResolversPassReq=true, segChkFreq=10000,commSpi=TcpCommunicationSpi [connectGate=null, connPlc=null,enableForcibleNodeKill=false, enableTroubleshootingLog=false,srvLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationSpi$2@4c2bb6e0,locAddr=null, locHost=null, locPort=47100, locPortRange=100,shmemPort=-1, directBuf=true, directSndBuf=false,idleConnTimeout=600000, connTimeout=5000, maxConnTimeout=600000,reconCnt=10, sockSndBuf=32768, sockRcvBuf=32768, msgQueueLimit=0,slowClientQueueLimit=0, nioSrvr=null, shmemSrv=null,usePairedConnections=false, connectionsPerNode=1, tcpNoDelay=true,filterReachableAddresses=false, ackSndThreshold=32,unackedMsgsBufSize=0, sockWriteTimeout=2000, lsnr=null, boundTcpPort=-1,boundTcpShmemPort=-1, selectorsCnt=4, selectorSpins=0, addrRslvr=null,ctxInitLatch=java.util.concurrent.CountDownLatch@3e62d773[Count = 1],stopping=false,metricsLsnr=org.apache.ignite.spi.communication.tcp.TcpCommunicationMetricsListener@4ef74c30],evtSpi=org.apache.ignite.spi.eventstorage.NoopEventStorageSpi@7283d3eb,colSpi=NoopCollisionSpi [], deploySpi=LocalDeploymentSpi [lsnr=null],indexingSpi=org.apache.ignite.spi.indexing.noop.NoopIndexingSpi@47c81abf,addrRslvr=null, clientMode=false, rebalanceThreadPoolSize=1,txCfg=org.apache.ignite.configuration.TransactionConfiguration@776a6d9b,cacheSanityCheckEnabled=true, discoStartupDelay=60000,deployMode=PRIVATE, p2pMissedCacheSize=100, locHost=null,timeSrvPortBase=31100, timeSrvPortRange=100,failureDetectionTimeout=10000, clientFailureDetectionTimeout=30000,metricsLogFreq=60000, hadoopCfg=null,connectorCfg=org.apache.ignite.configuration.ConnectorConfiguration@21d03963,odbcCfg=null, warmupClos=null, atomicCfg=AtomicConfiguration[seqReserveSize=1000, cacheMode=PARTITIONED, backups=1, aff=null,grpName=null], classLdr=null, sslCtxFactory=null, platformCfg=null,binaryCfg=null, memCfg=null, pstCfg=null, dsCfg=DataStorageConfiguration[sysRegionInitSize=41943040, sysCacheMaxSize=104857600, pageSize=0,concLvl=4, dfltDataRegConf=DataRegionConfiguration [name=default,maxSize=34359738368, initSize=268435456, swapPath=null,pageEvictionMode=DISABLED, evictionThreshold=0.9,emptyPagesPoolSize=100, metricsEnabled=false, metricsSubIntervalCount=5,metricsRateTimeInterval=60000, persistenceEnabled=true,checkpointPageBufSize=0], storagePath=/data/ignite/storage,checkpointFreq=180000, lockWaitTime=10000, checkpointThreads=4,checkpointWriteOrder=SEQUENTIAL, walHistSize=20, walSegments=10,walSegmentSize=67108864, walPath=/data/ignite/wal,walArchivePath=db/wal/archive, metricsEnabled=false, walMode=LOG_ONLY,walTlbSize=131072, walBuffSize=0, walFlushFreq=2000, walFsyncDelay=1000,walRecordIterBuffSize=67108864, alwaysWriteFullPages=false,fileIOFactory=org.apache.ignite.internal.processors.cache.persistence.file.AsyncFileIOFactory@18ece7f4,metricsSubIntervalCnt=5, metricsRateTimeInterval=60000,walAutoArchiveAfterInactivity=-1, writeThrottlingEnabled=false,walCompactionEnabled=true], activeOnStart=true, autoActivation=true,longQryWarnTimeout=3000, sqlConnCfg=null,cliConnCfg=ClientConnectorConfiguration [host=10.37.184.213, port=10800,portRange=100, sockSndBufSize=0, sockRcvBufSize=0, tcpNoDelay=true,maxOpenCursorsPerConn=128, threadPoolSize=8, idleTimeout=0,jdbcEnabled=true, odbcEnabled=true, thinCliEnabled=true,sslEnabled=false, useIgniteSslCtxFactory=true, sslClientAuth=false,sslCtxFactory=null], authEnabled=false,failureHnd=RestartProcessFailureHandler [],commFailureRslvr=null]2019-01-14T10:33:35,459][INFO][main][IgniteKernal] Daemon mode: off2019-01-14T10:33:35,460][INFO ][main][IgniteKernal] OS: Linux3.10.0-229.el7.x86_64 amd64

2019-01-14T10:33:35,460][INFO ][main][IgniteKernal] OS user: root
2019-01-14T10:33:35,461][INFO ][main][IgniteKernal] PID: 25000

2019-01-14T10:33:35,461][INFO ][main][IgniteKernal] Language runtime:Java Platform API Specification ver. 1.82019-01-14T10:33:35,461][INFO ][main][IgniteKernal] VM information:Java(TM) SE Runtime Environment 1.8.0_151-b12 Oracle Corporation JavaHotSpot(TM) 64-Bit Server VM 25.151-b12

2019-01-14T10:33:35,463][INFO ][main][IgniteKernal] VM total memory: 8.0GB

2019-01-14T10:33:35,463][INFO ][main][IgniteKernal] Remote Management[restart: on, REST: on, JMX (remote: on, port: 49224, auth: off, ssl: off)]2019-01-14T10:33:35,464][INFO ][main][IgniteKernal] Logger: Log4J2Logger[quiet=false, config=config/log4j2.xml]2019-01-14T10:33:35,464][INFO ][main][IgniteKernal]IGNITE_HOME=/opt/ignite/apache-ignite-fabric-2.6.0-bin2019-01-14T10:33:35,464][INFO ][main][IgniteKernal] VM arguments:[-Xms1g, -Xmx8g, -XX:+AggressiveOpts, -XX:MaxMetaspaceSize=384m,-XX:+AlwaysPreTouch, -XX:+ScavengeBeforeFullGC, -XX:+DisableExplicitGC, -XX:+UseG1GC, -Xss4m, -Djava.net.preferIPv4Stack=true,-DIGNITE_QUIET=false,-DIGNITE_SUCCESS_FILE=/opt/ignite/apache-ignite-fabric-2.6.0-bin/work/ignite_success_911c4c15-f1f8-49b4-9a92-85e54059d4d4,-Dcom.sun.management.jmxremote,-Dcom.sun.management.jmxremote.port=49224,-Dcom.sun.management.jmxremote.authenticate=false,-Dcom.sun.management.jmxremote.ssl=false,-DIGNITE_HOME=/opt/ignite/apache-ignite-fabric-2.6.0-bin,-DIGNITE_PROG_NAME=/opt/ignite/apache-ignite-fabric-2.6.0-bin/bin/ignite.sh]2019-01-14T10:33:35,465][INFO][main][IgniteKernal] System cache's DataRegion size is configured to 40MB. Use DataStorageConfiguration.systemCacheMemorySize property tochange the setting.2019-01-14T10:33:35,484][INFO ][main][IgniteKernal] Configured caches[in 'sysMemPlc' dataRegion: ['ignite-sys-cache']]2019-01-14T10:33:35,485][WARN ][main][IgniteKernal] Peer class loadingis enabled (disable it in production for performance and deploymentconsistency reasons)2019-01-14T10:33:35,494][INFO ][main][IgniteKernal] 3-rd party licensescan be found at: /opt/ignite/apache-ignite-fabric-2.6.0-bin/libs/licenses2019-01-14T10:33:35,494][INFO ][main][IgniteKernal] Local node userattribute [DATA_ROLE=BUDS]2019-01-14T10:33:35,555][INFO ][main][IgnitePluginProcessor] Configuredplugins:

2019-01-14T10:33:35,556][INFO ][main][IgnitePluginProcessor]   ^-- None
2019-01-14T10:33:35,556][INFO ][main][IgnitePluginProcessor]

2019-01-14T10:33:35,557][INFO ][main][FailureProcessor] Configuredfailure handler: [hnd=RestartProcessFailureHandler []]2019-01-14T10:33:35,601][INFO ][main][TcpCommunicationSpi] Successfullybound communication NIO server to TCP port [port=47100,locHost=0.0.0.0/0.0.0.0, selectorsCnt=4, selectorSpins=0, pairedConn=false]2019-01-14T10:33:35,603][WARN ][main][TcpCommunicationSpi]Message queue limit is set to 0 which may lead to potential OOMEs whenrunning cache operations in FULL_ASYNC or PRIMARY_SYNC modesdue to message queues growth on sender and receiversides.2019-01-14T10:33:35,626][WARN ][main][NoopCheckpointSpi]Checkpoints are disabled (to enable configure any GridCheckpointSpiimplementation)2019-01-14T10:33:35,650][WARN ][main][GridCollisionManager] Collisionresolution is disabled (all jobs will be activated upon arrival).2019-01-14T10:33:35,652][INFO ][main][IgniteKernal] Security status[authentication=off, tls/ssl=off]2019-01-14T10:33:35,682][INFO ][main][TcpDiscoverySpi] Successfullybound to TCP port [port=47500, localHost=0.0.0.0/0.0.0.0,locNodeId=142b548a-6480-4c31-9559-7d7b2092175c]2019-01-14T10:33:35,691][INFO ][main][PdsFoldersResolver] Successfullylocked persistence storage folder[/data/ignite/storage/node00-cf5501f0-c13e-457f-8c34-7477b2101905]2019-01-14T10:33:35,692][INFO ][main][PdsFoldersResolver] Consistent IDused for local node is [cf5501f0-c13e-457f-8c34-7477b2101905] accordingto persistence data storage folders2019-01-14T10:33:35,692][INFO ][main][CacheObjectBinaryProcessorImpl]Resolved directory for serialized binary metadata:/opt/ignite/apache-ignite-fabric-2.6.0-bin/work/binary_meta/node00-cf5501f0-c13e-457f-8c34-7477b21019052019-01-14T10:33:35,813][WARN][main][GridCacheProcessor] Deployment mode for cache is not CONTINUOUSor SHARED (it is recommended that you change deployment mode andrestart): PRIVATE2019-01-14T10:33:35,914][INFO ][main][FilePageStoreManager] Resolvedpage store work directory:/data/ignite/storage/node00-cf5501f0-c13e-457f-8c34-7477b21019052019-01-14T10:33:35,914][INFO ][main][FileWriteAheadLogManager] Resolvedwrite ahead log work directory:/data/ignite/wal/node00-cf5501f0-c13e-457f-8c34-7477b21019052019-01-14T10:33:35,915][INFO ][main][FileWriteAheadLogManager] Resolvedwrite ahead log archive directory:/opt/ignite/apache-ignite-fabric-2.6.0-bin/work/db/wal/archive/node00-cf5501f0-c13e-457f-8c34-7477b21019052019-01-14T10:33:35,942][INFO][main][FileWriteAheadLogManager] Started write-ahead log manager[mode=LOG_ONLY]2019-01-14T10:33:35,953][WARN ][main][GridCacheDatabaseSharedManager]Page eviction mode set for [DR_MEM] data will have no effect because theoldest pages are evicted automatically if Ignite persistence is enabled.2019-01-14T10:33:35,975][INFO][main][GridCacheDatabaseSharedManager] Read checkpoint status[startMarker=/data/ignite/storage/node00-cf5501f0-c13e-457f-8c34-7477b2101905/cp/1547192473734-c5ed49c2-0263-4d21-9c1e-63fcd0f6c9d6-START.bin,endMarker=/data/ignite/storage/node00-cf5501f0-c13e-457f-8c34-7477b2101905/cp/1547192473734-c5ed49c2-0263-4d21-9c1e-63fcd0f6c9d6-END.bin]2019-01-14T10:33:35,988][INFO][main][PageMemoryImpl] Started page memory [memoryAllocated=100.0 MiB,pages=24812, tableSize=1.9 MiB, checkpointBuffer=100.0 MiB]2019-01-14T10:33:35,989][INFO ][main][GridCacheDatabaseSharedManager]Checking memory state [lastValidPos=FileWALPointer [idx=3612,fileOff=49901060, len=40363], lastMarked=FileWALPointer [idx=3612, fileOff=49901060, len=40363],lastCheckpointId=c5ed49c2-0263-4d21-9c1e-63fcd0f6c9d6]2019-01-14T10:33:36,017][INFO][main][FileWriteAheadLogManager] Stopping WAL iteration due to anexception: Failed to read WAL record at position: 49941423,ptr=FileWALPointer [idx=3612, fileOff=49941423, len=0]2019-01-14T10:33:36,018][INFO][main][GridCacheDatabaseSharedManager] Found last checkpoint marker[cpId=c5ed49c2-0263-4d21-9c1e-63fcd0f6c9d6, pos=FileWALPointer[idx=3612, fileOff=49901060,len=40363]]2019-01-14T10:33:36,048][INFO][main][GridCacheDatabaseSharedManager] Applying lost cache updatessince last checkpoint record [lastMarked=FileWALPointer [idx=3612,fileOff=49901060, len=40363],lastCheckpointId=c5ed49c2-0263-4d21-9c1e-63fcd0f6c9d6]2019-01-14T10:33:36,062][INFO][main][FileWriteAheadLogManager] Stopping WAL iteration due to anexception: Failed to read WAL record at position: 49941423,ptr=FileWALPointer [idx=3612, fileOff=49941423, len=0]2019-01-14T10:33:36,063][INFO][main][GridCacheDatabaseSharedManager] Finished applying WAL changes[updatesApplied=0, time=10ms]2019-01-14T10:33:36,113][INFO ][main][GridClusterStateProcessor]Restoring history for BaselineTopology[id=0]2019-01-14T10:33:36,234][INFO ][main][ClientListenerProcessor] Clientconnector processor has started on TCP port 108002019-01-14T10:33:36,286][INFO ][main][GridTcpRestProtocol] Commandprotocol successfully started [name=TCP binary, host=0.0.0.0/0.0.0.0,port=11211]2019-01-14T10:33:36,317][INFO ][main][IgniteKernal] Non-loopback localIPs: 10.37.184.2132019-01-14T10:33:36,317][INFO ][main][IgniteKernal] Enabled local MACs:FA163E3C967A2019-01-14T10:33:37,663][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCPdiscovery accepted incoming connection [rmtAddr=/10.37.184.217,rmtPort=37803]2019-01-14T10:33:37,674][INFO ][tcp-disco-srvr-#2][TcpDiscoverySpi] TCPdiscovery spawning a new thread for connection [rmtAddr=/10.37.184.217,rmtPort=37803]2019-01-14T10:33:37,675][INFO][tcp-disco-sock-reader-#5][TcpDiscoverySpi] Started serving remote nodeconnection [rmtAddr=/10.37.184.217:37803, rmtPort=37803]2019-01-14T10:33:37,751][ERROR][tcp-disco-msg-worker-#3][TcpDiscoverySpi]TcpDiscoverSpi's message worker thread failed abnormally. Stopping thenode in order to prevent cluster wide instability.2019-01-14T10:33:37,757][ERROR][tcp-disco-msg-worker-#3][] Criticalsystem error detected. Will be handled accordingly to configured handler[hnd=class o.a.i.failure.RestartProcessFailureHandler, failureCtx=FailureContext [type=SYSTEM_WORKER_TERMINATION,err=class o.a.i.IgniteException: Node with BaselineTopology cannot joinmixed cluster running in compatibilitymode]]2019-01-14T10:33:37,760][ERROR][tcp-disco-msg-worker-#3][FailureProcessor]Ignite node is in invalid state due to a critical failure.2019-01-14T10:33:37,761][ERROR][tcp-disco-msg-worker-#3][TcpDiscoverySpi]Runtime error caught during grid runnable execution: IgniteSpiThread[name=tcp-disco-msg-worker-#3]2019-01-14T10:33:37,761][ERROR][node-restarter][] Restarting JVM onIgnite failure: [failureCtx=FailureContext[type=SYSTEM_WORKER_TERMINATION, err=class o.a.i.IgniteException: Nodewith BaselineTopology cannot join mixed cluster running in compatibilitymode]][10:33:37] Restarting node. Will exit (250).2019-01-14T10:33:37,757][ERROR][main][IgniteKernal] Failed to startmanager: GridManagerAdapter [enabled=true,name=o.a.i.i.managers.discovery.GridDiscoveryManager]2019-01-14T10:33:37,763][ERROR][main][IgniteKernal] Got exception whilestarting (will rollback startup routine).[10:33:37] (wrn) Ignoring stopping Ignite instance that was alreadystopped or never started: null2019-01-14T10:33:37,765][INFO ][node-stop-thread][TcpDiscoverySpi]Stopped the node successfully in response to TcpDiscoverySpi's messageworker thread abnormal termination.2019-01-14T10:33:37,776][INFO ][main][GridTcpRestProtocol] Commandprotocol successfully stopped: TCP binary2019-01-14T10:33:37,788][INFO][tcp-disco-sock-reader-#5][TcpDiscoverySpi] Finished serving remotenode connection [rmtAddr=/10.37.184.217:37803, rmtPort=37803

2019-01-14T10:33:37,955][INFO ][main][IgniteKernal]

>>>+---------------------------------------------------------------------------------+>>> Ignite ver.2.6.0#20180710-sha1:669feacc5d3a4e60efcdd300dc8de99780f38eed stopped OK>>>+---------------------------------------------------------------------------------+

>>> Grid uptime: 00:00:03.441

class org.apache.ignite.IgniteException: Failed to start manager:GridManagerAdapter [enabled=true,name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] atorg.apache.ignite.internal.util.IgniteUtils.convertException(IgniteUtils.java:990)

    at org.apache.ignite.Ignition.start(Ignition.java:355)

atorg.apache.ignite.startup.cmdline.CommandLineStartup.main(CommandLineStartup.java:301)2019-01-14T10:33:37,959][WARN ][node-restarter][G] Attempting to stop analready stopped Ignite instance (ignore): nullCaused by: class org.apache.ignite.IgniteCheckedException: Failed tostart manager: GridManagerAdapter [enabled=true,name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager] atorg.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1726) atorg.apache.ignite.internal.IgniteKernal.start(IgniteKernal.java:1028) atorg.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start0(IgnitionEx.java:2014) atorg.apache.ignite.internal.IgnitionEx$IgniteNamedInstance.start(IgnitionEx.java:1723)

    at org.apache.ignite.internal.IgnitionEx.start0(IgnitionEx.java:1151)

atorg.apache.ignite.internal.IgnitionEx.startConfigurations(IgnitionEx.java:1069)

    at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:955)
    at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:854)
    at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:724)
    at org.apache.ignite.internal.IgnitionEx.start(IgnitionEx.java:693)
    at org.apache.ignite.Ignition.start(Ignition.java:352)
    ... 1 more

Caused by: class org.apache.ignite.IgniteCheckedException: Failed tostart SPI: TcpDiscoverySpi [addrRslvr=null, sockTimeout=15000,ackTimeout=60000, marsh=JdkMarshaller [clsFilter=org.apache.ignite.marshaller.MarshallerUtils$1@41f4fe5], reconCnt=10,reconDelay=2000, maxAckTimeout=600000, forceSrvMode=false,clientReconnectDisabled=false, internalLsnr=null] atorg.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:300) atorg.apache.ignite.internal.managers.discovery.GridDiscoveryManager.start(GridDiscoveryManager.java:915) atorg.apache.ignite.internal.IgniteKernal.startManager(IgniteKernal.java:1721)

    ... 11 more

Caused by: class org.apache.ignite.spi.IgniteSpiException: Thread hasbeen interrupted. atorg.apache.ignite.spi.discovery.tcp.ServerImpl.joinTopology(ServerImpl.java:938) atorg.apache.ignite.spi.discovery.tcp.ServerImpl.spiStart(ServerImpl.java:373) atorg.apache.ignite.spi.discovery.tcp.TcpDiscoverySpi.spiStart(TcpDiscoverySpi.java:1948) atorg.apache.ignite.internal.managers.GridManagerAdapter.startSpi(GridManagerAdapter.java:297)

    ... 13 more

Failed to start grid: Failed to start manager: GridManagerAdapter[enabled=true,name=org.apache.ignite.internal.managers.discovery.GridDiscoveryManager]


-----------------------------log end------------------------------


在 2019/1/12 上午12:58, Ilya Kasnacheev 写道:

Hello!

Can you show what you get in logs as your nodes attempt to join thecluster?


Regards,
--
Ilya Kasnacheev

пт, 11 янв. 2019 г. в 19:43, 李玉珏@163 <[email protected]<mailto:[email protected]>>:


    Hi,

    Currently, after cluster activation, if a node with native
    persistence
    is enabled terminates abnormally,when the node is restarted, it
    cannot
    join the cluster.

    So the question is:

    1.If the node terminates abnormally, how can the node rejoin the
    cluster?

    2.How to restart the node gracefully?

Re: Abnormal termination of nodes with native persistence enabled

Reply via email to