[ https://issues.apache.org/jira/browse/IGNITE-25418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Denis Chudov updated IGNITE-25418: ---------------------------------- Attachment: ItStreamTonsOfData.java > Long write of 1000-rows batches into storage > -------------------------------------------- > > Key: IGNITE-25418 > URL: https://issues.apache.org/jira/browse/IGNITE-25418 > Project: Ignite > Issue Type: Bug > Reporter: Denis Chudov > Priority: Major > Labels: ignite-3 > Attachments: ItStreamTonsOfData.java > > > While investigating "Send with retry timeout" exceptions occurring under the > load from data streamer, I logged the time of call in > PartitionListener#handleUpdateAllCommand here: > > {code:java} > storageUpdateHandler.handleUpdateAll( > txId, > cmd.rowsToUpdate(), > cmd.commitPartitionId().asReplicationGroupId(), > !cmd.full(), > () -> storage.lastApplied(commandIndex, commandTerm), > cmd.full() ? safeTimestamp : null, > indexIdsAtRwTxBeginTs(catalogService, txId, storage.tableId()) > ); {code} > At some time, it took over hundreds of milliseconds: > > {code:java} > [2025-05-19T11:33:22,849][INFO > ][%istod_sdt_3345%JRaft-FSMCaller-Disruptor_stripe_3-0][PartitionListener] > qqq groupId=19_part_9, storageWriteMcs=341377, hash=-1115824889 {code} > In this line, the number is in microseconds. > Also, I see that the delta files' compaction time is constantly growing (look > at the two pairs of lines, page number is the same but compaction time grew > dramatically, as well as the checkpoint fsync time): > {code:java} > 2025-03-26 14:46:23:999 +0000 [INFO][%node1%checkpoint-thread][Checkpointer] > Checkpoint finished [checkpointId=67f35966-b0fd-4377-8e5b-7a3af7d73aa2, > pages=140807, pagesWriteTime=1132ms, fsyncTime=2200ms, > replicatorLogSyncTime=126ms, totalTime=3591ms, > avgWriteSpeed=613MB/s]2025-03-26 14:46:26:609 +0000 > [INFO][%node1%compaction-thread][Compactor] Compaction round finished > [compactionId=8fc11431-783f-4111-8e5b-017870110cf5, pages=140807, > duration=2610ms, avgWriteSpeed=843MB/s] > 2025-03-26 15:11:02:160 +0000 [INFO][%node1%checkpoint-thread][Checkpointer] > Checkpoint finished [checkpointId=2b4d735f-1b09-471a-a47e-3d5c5c8c7f5d, > pages=165589, pagesWriteTime=1476ms, fsyncTime=6027ms, > replicatorLogSyncTime=230ms, totalTime=7798ms, > avgWriteSpeed=332MB/s]2025-03-26 15:11:07:276 +0000 > [INFO][%node1%compaction-thread][Compactor] Compaction round finished > [compactionId=6984cff7-e1e1-4ec7-ba5a-5e5d7037898a, pages=170116, > duration=10213ms, avgWriteSpeed=260MB/s] {code} > I have also seen the plenty of lines like this: > {code:java} > [2025-05-19T11:32:41,268][INFO > ][checkpoint-runner-io0][CheckpointPagesWriter] Checkpoint pages were not > written yet due to unsuccessful page write lock acquisition and will be > retried [pageCount=1] {code} > *Definition of done* > The reason of long write time and the degradation of compaction time should > be investigated and fixed, if applicable. > *Notes* > Reproducer is attached, put it into ignite-runner - integrationTests. > -- This message was sent by Atlassian Jira (v8.20.10#820010)