Vladimir Steshin created IGNITE-19904:
-----------------------------------------

             Summary: Assertion in defragmentation
                 Key: IGNITE-19904
                 URL: https://issues.apache.org/jira/browse/IGNITE-19904
             Project: Ignite
          Issue Type: Bug
            Reporter: Vladimir Steshin


Defragmentaion fails with:
{code:java}
java.lang.AssertionError: Invalid state. Type is 0! pageId = 0001000d00024cbf
                at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.copyPageForCheckpoint(PageMemoryImpl.java:1359)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
                at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1277)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
                at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.writePages(CheckpointPagesWriter.java:208)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
                at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:150)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
{code}

Difficult to write a test. Can't reproduce on my computers :(. Flackly appears 
on a server (4 core x 4 cpu) with 100G of the test cache data and million+ 
pages to checkpoint during defragmentation. More often, this occurs with 
pageSize 1024 (to produce more pages).

Regarding my diagnostic build, I suppose that a fresh, empty page is caught in 
defragmentation. Here is a page dump with test-expented PAGE_OVERHEAD (=64) and 
same error a bit before copyPageForCheckpoint():
{code:java}
org.apache.ignite.IgniteException: Wrong page type in checkpointWritePage1. 
Page: Data region = 'defragPartitionsDataRegion'.
 FullPageId [pageId=281878703760205, effectivePageId=403727049549, 
grpId=-1368047378].
 PageDump = page_id: 281878703760205, rel_id: 48603, cache_id: -1368047378, 
pin: 0, lock: 65536, tmp_buf: 72057594037927935, test_val: 1. data_hex: 
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
                at 
org.apache.ignite.internal.processors.cache.persistence.pagemem.PageMemoryImpl.checkpointWritePage(PageMemoryImpl.java:1240)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
                at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.writePages(CheckpointPagesWriter.java:208)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
                at 
org.apache.ignite.internal.processors.cache.persistence.checkpoint.CheckpointPagesWriter.run(CheckpointPagesWriter.java:150)
 ~[ignite-core-2.16.0-SNAPSHOT.jar:2.16.0-SNAPSHOT]
{code}

'test_val' is my diagnostic page prefix extension. Various numbers are assigned 
where tmp_buf is assigned. '1' comes from PageHeader::initNew(long absPtr, long 
relative):
{code:java}
    public static void PageHeader::initNew(long absPtr, long relative) {
        relative(absPtr, relative);

        tempBufferPointer(absPtr, PageMemoryImpl.INVALID_REL_PTR);

        GridUnsafe.putLong(absPtr, PAGE_MARKER);
        GridUnsafe.putInt(absPtr + PAGE_PIN_CNT_OFFSET, 0);

        // For diagnostic purposes.
        PageUtils.setTestTmpValue(absPtr, 1);
    }
    
    public static void PageUtils::setTestTmpValue(long absPtr, int val) {
        // Next to page overhead
        putInt(absPtr, PAGE_OVERHEAD - 4, val);
    }
    
    public static final int PageMemoryImpl#PAGE_OVERHEAD = 64;
{code}






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to