Hi Arseny,

Seems this is already fixed [1] in master, but seems there is another issue
[2] and we are in the middle of fixing it.
We've found there were some unsafe memory changing operations without lock.


[1] https://issues.apache.org/jira/browse/IGNITE-6423
[2] https://issues.apache.org/jira/browse/IGNITE-7278

On Tue, Dec 26, 2017 at 1:02 PM, Arseny Kovalchuk <
[email protected]> wrote:

> Hi guys.
>
> Another issue when using Ignite 2.3 with native persistence enabled. See
> details below.
>
> We deploy Ignite along with our services in Kubernetes (v 1.8) on
> premises. Ignite cluster is a StatefulSet of 5 Pods (5 instances) of Ignite
> version 2.3. Each Pod mounts PersistentVolume backed by CEPH RBD.
>
> We put about 230 events/second into Ignite, 70% of events are ~200KB in
> size and 30% are 5000KB. Smaller events have indexed fields and we query
> them via SQL.
>
> The cluster is activated from a client node which also streams events into
> Ignite from Kafka. We use custom implementation of streamer which uses
> cache.putAll() API.
>
> We started cluster from scratch without any persistent data. After a while
> we got corrupted data with the error message.
>
> [2017-12-26 07:44:14,251] ERROR [sys-#127%ignite-instance-2%]
> org.apache.ignite.internal.processors.cache.distributed.dht.preloader.GridDhtPreloader:
> - Partition eviction failed, this can cause grid hang.
> class org.apache.ignite.IgniteException: Runtime failure on search row:
> Row@5b1479d6[ key: 171:1513946618964:3008806055072854, val:
> ru.synesis.kipod.event.KipodEvent [idHash=510912646, hash=-387621419,
> face_last_name=null, face_list_id=null, channel=171, source=,
> face_similarity=null, license_plate_number=null, descriptors=null,
> cacheName=kipod_events, cacheKey=171:1513946618964:3008806055072854,
> stream=171, alarm=false, processed_at=0, face_id=null, id=3008806055072854,
> persistent=false, face_first_name=null, license_plate_first_name=null,
> face_full_name=null, level=0, module=Kpx.Synesis.Outdoor,
> end_time=1513946624379, params=null, commented_at=0, tags=[vehicle, 0,
> human, 0, truck, 0, start_time=1513946618964, processed=false,
> kafka_offset=111259, license_plate_last_name=null, armed=false,
> license_plate_country=null, topic=MovingObject, comment=,
> expiration=1514033024000, original_id=null, license_plate_lists=null], ver:
> GridCacheVersion [topVer=125430590, order=1513955001926, nodeOrder=3] ][
> 3008806055072854, MovingObject, Kpx.Synesis.Outdoor, 0, , 1513946618964,
> 1513946624379, 171, 171, FALSE, FALSE, , FALSE, FALSE, 0, 0, 111259,
> 1514033024000, (vehicle, 0, human, 0, truck, 0), null, null, null, null,
> null, null, null, null, null, null, null, null ]
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.doRemove(BPlusTree.java:1787)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.remove(BPlusTree.java:1578)
> at org.apache.ignite.internal.processors.query.h2.database.
> H2TreeIndex.remove(H2TreeIndex.java:216)
> at org.apache.ignite.internal.processors.query.h2.opt.
> GridH2Table.doUpdate(GridH2Table.java:496)
> at org.apache.ignite.internal.processors.query.h2.opt.GridH2Table.update(
> GridH2Table.java:423)
> at org.apache.ignite.internal.processors.query.h2.IgniteH2Indexing.remove(
> IgniteH2Indexing.java:580)
> at org.apache.ignite.internal.processors.query.GridQueryProcessor.remove(
> GridQueryProcessor.java:2334)
> at org.apache.ignite.internal.processors.cache.query.
> GridCacheQueryManager.remove(GridCacheQueryManager.java:461)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.finishRemove(
> IgniteCacheOffheapManagerImpl.java:1453)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheOffheapManagerImpl$CacheDataStoreImpl.remove(
> IgniteCacheOffheapManagerImpl.java:1416)
> at org.apache.ignite.internal.processors.cache.persistence.
> GridCacheOffheapManager$GridCacheDataStore.remove(
> GridCacheOffheapManager.java:1271)
> at org.apache.ignite.internal.processors.cache.
> IgniteCacheOffheapManagerImpl.remove(IgniteCacheOffheapManagerImpl.
> java:374)
> at org.apache.ignite.internal.processors.cache.
> GridCacheMapEntry.removeValue(GridCacheMapEntry.java:3233)
> at org.apache.ignite.internal.processors.cache.distributed.
> dht.GridDhtCacheEntry.clearInternal(GridDhtCacheEntry.java:588)
> at org.apache.ignite.internal.processors.cache.distributed.
> dht.GridDhtLocalPartition.clearAll(GridDhtLocalPartition.java:951)
> at org.apache.ignite.internal.processors.cache.distributed.
> dht.GridDhtLocalPartition.tryEvict(GridDhtLocalPartition.java:809)
> at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPreloader$3.call(GridDhtPreloader.java:593)
> at org.apache.ignite.internal.processors.cache.distributed.dht.preloader.
> GridDhtPreloader$3.call(GridDhtPreloader.java:580)
> at org.apache.ignite.internal.util.IgniteUtils.
> wrapThreadLoader(IgniteUtils.java:6631)
> at org.apache.ignite.internal.processors.closure.
> GridClosureProcessor$2.body(GridClosureProcessor.java:967)
> at org.apache.ignite.internal.util.worker.GridWorker.run(
> GridWorker.java:110)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1149)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.IllegalStateException: Failed to get page IO
> instance (page content is corrupted)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.io.IOVersions.forVersion(IOVersions.java:83)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.io.IOVersions.forPage(IOVersions.java:95)
> at org.apache.ignite.internal.processors.cache.persistence.
> CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:148)
> at org.apache.ignite.internal.processors.cache.persistence.
> CacheDataRowAdapter.initFromLink(CacheDataRowAdapter.java:102)
> at org.apache.ignite.internal.processors.query.h2.database.
> H2RowFactory.getRow(H2RowFactory.java:62)
> at org.apache.ignite.internal.processors.query.h2.database.
> io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:126)
> at org.apache.ignite.internal.processors.query.h2.database.
> io.H2ExtrasLeafIO.getLookupRow(H2ExtrasLeafIO.java:36)
> at org.apache.ignite.internal.processors.query.h2.database.
> H2Tree.getRow(H2Tree.java:123)
> at org.apache.ignite.internal.processors.query.h2.database.
> H2Tree.getRow(H2Tree.java:40)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.getRow(BPlusTree.java:4372)
> at org.apache.ignite.internal.processors.query.h2.database.
> H2Tree.compare(H2Tree.java:200)
> at org.apache.ignite.internal.processors.query.h2.database.
> H2Tree.compare(H2Tree.java:40)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.compare(BPlusTree.java:4359)
> at org.apache.ignite.internal.processors.cache.persistence.tree.BPlusTree.
> findInsertionPoint(BPlusTree.java:4279)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.access$1500(BPlusTree.java:81)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree$Search.run0(BPlusTree.java:261)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4697)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree$GetPageHandler.run(BPlusTree.java:4682)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.util.PageHandler.readPage(PageHandler.java:158)
> at org.apache.ignite.internal.processors.cache.persistence.
> DataStructure.read(DataStructure.java:319)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.removeDown(BPlusTree.java:1823)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.removeDown(BPlusTree.java:1842)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.removeDown(BPlusTree.java:1842)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.removeDown(BPlusTree.java:1842)
> at org.apache.ignite.internal.processors.cache.persistence.
> tree.BPlusTree.doRemove(BPlusTree.java:1752)
> ... 23 more
>
>
> After restart we also
>
> ​
> Arseny Kovalchuk
>
> Senior Software Engineer at Synesis
> skype: arseny.kovalchuk
> mobile: +375 (29) 666-16-16
> ​LinkedIn Profile <http://www.linkedin.com/in/arsenykovalchuk/en>​
>



-- 
Best regards,
Andrey V. Mashenkov

Reply via email to