Hi Arseny, Seems this is already fixed [1] in master, but seems there is another issue [2] and we are in the middle of fixing it. We've found there were some unsafe memory changing operations without lock.
[1] https://issues.apache.org/jira/browse/IGNITE-6423 [2] https://issues.apache.org/jira/browse/IGNITE-7278 On Tue, Dec 26, 2017 at 1:02 PM, Arseny Kovalchuk < [email protected]> wrote: > Hi guys. > > Another issue when using Ignite 2.3 with native persistence enabled. See > details below. > > We deploy Ignite along with our services in Kubernetes (v 1.8) on > premises. Ignite cluster is a StatefulSet of 5 Pods (5 instances) of Ignite > version 2.3. Each Pod mounts PersistentVolume backed by CEPH RBD. > > We put about 230 events/second into Ignite, 70% of events are ~200KB in > size and 30% are 5000KB. Smaller events have indexed fields and we query > them via SQL. > > The cluster is activated from a client node which also streams events into > Ignite from Kafka. We use custom implementation of streamer which uses > cache.putAll() API. > > We started cluster from scratch without any persistent data. After a while > we got corrupted data with the error message. > > [2017-12-26 07:44:14,251] ERROR [sys-#127%ignite-instance-2%] > org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader: > - Partition eviction failed, this can cause grid hang. > class org.apache.ignite.IgniteException: Runtime failure on search row: > Row@5b1479d6[ key: 171:1513946618964:3008806055072854, val: > ru.synesis.kipod.event.KipodEvent [idHash=510912646, hash=-387621419, > face_last_name=null, face_list_id=null, channel=171, source=, > face_similarity=null, license_plate_number=null, descriptors=null, > cacheName=kipod_events, cacheKey=171:1513946618964:3008806055072854, > stream=171, alarm=false, processed_at=0, face_id=null, id=3008806055072854, > persistent=false, face_first_name=null, license_plate_first_name=null, > face_full_name=null, level=0, module=Kpx.Synesis.Outdoor, > end_time=1513946624379, params=null, commented_at=0, tags=[vehicle, 0, > human, 0, truck, 0, start_time=1513946618964, processed=false, > kafka_offset=111259, license_plate_last_name=null, armed=false, > license_plate_country=null, topic=MovingObject, comment=, > expiration=1514033024000, original_id=null, license_plate_lists=null], ver: > GridCacheVersion [topVer=125430590, order=1513955001926, nodeOrder=3] ][ > 3008806055072854, MovingObject, Kpx.Synesis.Outdoor, 0, , 1513946618964, > 1513946624379, 171, 171, FALSE, FALSE, , FALSE, FALSE, 0, 0, 111259, > 1514033024000, (vehicle, 0, human, 0, truck, 0), null, null, null, null, > null, null, null, null, null, null, null, null ] > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.doRemove(BPlusTree.java:1787) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.remove(BPlusTree.java:1578) > at org.apache.ignite.internal.processors.query.h2.database. > H2TreeIndex.remove(H2TreeIndex.java:216) > at org.apache.ignite.internal.processors.query.h2.opt. > GridH2Table.doUpdate(GridH2Table.java:496) > at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update( > GridH2Table.java:423) > at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove( > IgniteH2Indexing.java:580) > at org.apache.ignite.internal.processors.query.GridQueryProcessor.remove( > GridQueryProcessor.java:2334) > at org.apache.ignite.internal.processors.cache.query. > GridCacheQueryManager.remove(GridCacheQueryManager.java:461) > at org.apache.ignite.internal.processors.cache. > IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove( > IgniteCacheOffheapManagerImpl.java:1453) > at org.apache.ignite.internal.processors.cache. > IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove( > IgniteCacheOffheapManagerImpl.java:1416) > at org.apache.ignite.internal.processors.cache.persistence. > GridCacheOffheapManager$GridCacheDataStore.remove( > GridCacheOffheapManager.java:1271) > at org.apache.ignite.internal.processors.cache. > IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl. > java:374) > at org.apache.ignite.internal.processors.cache. > GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233) > at org.apache.ignite.internal.processors.cache.distributed. > dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588) > at org.apache.ignite.internal.processors.cache.distributed. > dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:951) > at org.apache.ignite.internal.processors.cache.distributed. > dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:809) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader. > GridDhtPreloader$3.call(GridDhtPreloader.java:593) > at org.apache.ignite.internal.processors.cache.distributed.dht.preloader. > GridDhtPreloader$3.call(GridDhtPreloader.java:580) > at org.apache.ignite.internal.util.IgniteUtils. > wrapThreadLoader(IgniteUtils.java:6631) > at org.apache.ignite.internal.processors.closure. > GridClosureProcessor$2.body(GridClosureProcessor.java:967) > at org.apache.ignite.internal.util.worker.GridWorker.run( > GridWorker.java:110) > at java.util.concurrent.ThreadPoolExecutor.runWorker( > ThreadPoolExecutor.java:1149) > at java.util.concurrent.ThreadPoolExecutor$Worker.run( > ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.IllegalStateException: Failed to get page IO > instance (page content is corrupted) > at org.apache.ignite.internal.processors.cache.persistence. > tree.io.IOVersions.forVersion(IOVersions.java:83) > at org.apache.ignite.internal.processors.cache.persistence. > tree.io.IOVersions.forPage(IOVersions.java:95) > at org.apache.ignite.internal.processors.cache.persistence. > CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148) > at org.apache.ignite.internal.processors.cache.persistence. > CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102) > at org.apache.ignite.internal.processors.query.h2.database. > H2RowFactory.getRow(H2RowFactory.java:62) > at org.apache.ignite.internal.processors.query.h2.database. > io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:126) > at org.apache.ignite.internal.processors.query.h2.database. > io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:36) > at org.apache.ignite.internal.processors.query.h2.database. > H2Tree.getRow(H2Tree.java:123) > at org.apache.ignite.internal.processors.query.h2.database. > H2Tree.getRow(H2Tree.java:40) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.getRow(BPlusTree.java:4372) > at org.apache.ignite.internal.processors.query.h2.database. > H2Tree.compare(H2Tree.java:200) > at org.apache.ignite.internal.processors.query.h2.database. > H2Tree.compare(H2Tree.java:40) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.compare(BPlusTree.java:4359) > at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree. > findInsertionPoint(BPlusTree.java:4279) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.access$1500(BPlusTree.java:81) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree$Search.run0(BPlusTree.java:261) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4697) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4682) > at org.apache.ignite.internal.processors.cache.persistence. > tree.util.PageHandler.readPage(PageHandler.java:158) > at org.apache.ignite.internal.processors.cache.persistence. > DataStructure.read(DataStructure.java:319) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.removeDown(BPlusTree.java:1823) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.removeDown(BPlusTree.java:1842) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.removeDown(BPlusTree.java:1842) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.removeDown(BPlusTree.java:1842) > at org.apache.ignite.internal.processors.cache.persistence. > tree.BPlusTree.doRemove(BPlusTree.java:1752) > ... 23 more > > > After restart we also > > > Arseny Kovalchuk > > Senior Software Engineer at Synesis > skype: arseny.kovalchuk > mobile: +375 (29) 666-16-16 > LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en> > -- Best regards, Andrey V. Mashenkov
