Denis Chudov created IGNITE-25418: ------------------------------------- Summary: Long write of 1000-rows batches into storage Key: IGNITE-25418 URL: https://issues.apache.org/jira/browse/IGNITE-25418 Project: Ignite Issue Type: Bug Reporter: Denis Chudov
While investigating "Send with retry timeout" exceptions occurring under the load from data streamer, I logged the time of call in PartitionListener#handleUpdateAllCommand here: {code:java} storageUpdateHandler.handleUpdateAll( txId, cmd.rowsToUpdate(), cmd.commitPartitionId().asReplicationGroupId(), !cmd.full(), () -> storage.lastApplied(commandIndex, commandTerm), cmd.full() ? safeTimestamp : null, indexIdsAtRwTxBeginTs(catalogService, txId, storage.tableId()) ); {code} At some time, it took over hundreds of milliseconds: {code:java} [2025-05-19T11:33:22,849][INFO ][%istod_sdt_3345%JRaft-FSMCaller-Disruptor_stripe_3-0][PartitionListener] qqq groupId=19_part_9, storageWriteMcs=341377, hash=-1115824889 {code} In this line, the number is in microseconds. Also, I see that the delta files' compaction time is constantly growing (look at the two pairs of lines, page number is the same but compaction time grew dramatically, as well as the checkpoint fsync time): {code:java} 2025-03-26 14:46:23:999 +0000 [INFO][%node1%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=67f35966-b0fd-4377-8e5b-7a3af7d73aa2, pages=140807, pagesWriteTime=1132ms, fsyncTime=2200ms, replicatorLogSyncTime=126ms, totalTime=3591ms, avgWriteSpeed=613MB/s]2025-03-26 14:46:26:609 +0000 [INFO][%node1%compaction-thread][Compactor] Compaction round finished [compactionId=8fc11431-783f-4111-8e5b-017870110cf5, pages=140807, duration=2610ms, avgWriteSpeed=843MB/s] 2025-03-26 15:11:02:160 +0000 [INFO][%node1%checkpoint-thread][Checkpointer] Checkpoint finished [checkpointId=2b4d735f-1b09-471a-a47e-3d5c5c8c7f5d, pages=165589, pagesWriteTime=1476ms, fsyncTime=6027ms, replicatorLogSyncTime=230ms, totalTime=7798ms, avgWriteSpeed=332MB/s]2025-03-26 15:11:07:276 +0000 [INFO][%node1%compaction-thread][Compactor] Compaction round finished [compactionId=6984cff7-e1e1-4ec7-ba5a-5e5d7037898a, pages=170116, duration=10213ms, avgWriteSpeed=260MB/s] {code} I have also seen the plenty of lines like this: {code:java} [2025-05-19T11:32:41,268][INFO ][checkpoint-runner-io0][CheckpointPagesWriter] Checkpoint pages were not written yet due to unsuccessful page write lock acquisition and will be retried [pageCount=1] {code} *Definition of done* The reason of long write time and the degradation of compaction time should be investigated and fixed, if applicable. *Notes* Reproducer is attached. -- This message was sent by Atlassian Jira (v8.20.10#820010)