[ 
https://issues.apache.org/jira/browse/IGNITE-28836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18092659#comment-18092659
 ] 

Ignite TC Bot commented on IGNITE-28836:
----------------------------------------

{panel:title=Branch: [pull/13296/head] Base: [master] : No blockers 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#D6F7C1}{panel}
{panel:title=Branch: [pull/13296/head] Base: [master] : No new tests 
found!|borderStyle=dashed|borderColor=#ccc|titleBGColor=#F7D6C1}{panel}
[TeamCity *--> Run :: All* 
Results|https://ci2.ignite.apache.org/viewLog.html?buildId=9162070&buildTypeId=IgniteTests24Java8_RunAll]
{color:#ffffff}tcbot-analysis-comment chainBuildId=9162070 
rerunBuildIds=none{color}

> DirectMessageWriter: reduce per-field overhead and per-message allocations on 
> the message serialization hot path
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: IGNITE-28836
>                 URL: https://issues.apache.org/jira/browse/IGNITE-28836
>             Project: Ignite
>          Issue Type: Task
>            Reporter: Anton Vinogradov
>            Assignee: Anton Vinogradov
>            Priority: Major
>
> h3. Motivation
> org.apache.ignite.internal.direct.DirectMessageWriter is on the critical path 
> of every outgoing message: generated serializers call it field by field, and 
> the NIO write loop re-enters it for every network buffer. Two inefficiencies 
> show up there:
> # Per-field stream resolution. Each of the ~32 writeXxx methods starts with 
> \{{DirectByteBufferStream stream = state.item().stream;}}, i.e. stack[pos] 
> (array load + bounds check) followed by a field load — re-evaluated on every 
> primitive write. The current stream only changes on setBuffer / 
> beforeNestedWrite / afterNestedWrite.
> # Per-field allocations in the compressed path. writeCompressedMessage() 
> allocates, for every compressed field, a fresh 
> ByteBuffer.allocateDirect(10KB) plus a brand-new DirectMessageWriter (its own 
> state stack + stream). The scratch buffer is only ever copied into a heap 
> byte[] by CompressedMessage.compress() (via buf.get(...)) before deflating, 
> so the direct allocation (native alloc + zeroing + Cleaner/GC reclamation) is 
> pure overhead. Heavy exchange messages (GridDhtPartitionsSingleMessage / 
> FullMessage) carry several compressed maps each, multiplying the cost during 
> PME.
> h3. Proposed changes
> * Cache the current state item's stream in a curStream field; refresh it only 
> in setBuffer, beforeNestedWrite, afterNestedWrite. All writeXxx methods use 
> curStream instead of re-resolving state.item().stream.
> * In writeCompressedMessage():
> ** use ByteBuffer.allocate() (heap) for the scratch buffer instead of 
> allocateDirect();
> ** reuse a lazily-created, thread-confined tmpWriter (reset() before each 
> use) instead of allocating a new writer per field — mirroring how the main 
> writer is already reused across messages;
> ** grow the scratch buffer without the intermediate byte[] copy.
> No wire-format change, no public API change.
> h3. Benchmark (JMH, JDK 17, throughput; A/B baseline vs patched)
> || Benchmark || Baseline || Patched || Delta ||
> | hotPathPrimitiveFields (1792 write calls/op) | ~551K ops/s | ~682K ops/s | 
> +24% |
> | compressed scratch acquire (direct+new -> heap+reuse) | 1.21M ops/s | 3.68M 
> ops/s | x3.0 |
> | compressed scratch: GC time | 1006 ms | 130 ms | x8 less |
> The compressed path trades a little cheap young-gen heap churn for the 
> elimination of off-heap / Cleaner direct-buffer churn, cutting total GC time 
> ~8x.
> h3. Testing
> * A JMH benchmark JmhDirectMessageWriterBenchmark is added under 
> modules/benchmarks.
> * Correctness verified by byte-for-byte writer->reader round-trips, identical 
> between baseline and patched:
> ** primitives/arrays/String/UUID, 5000 records, 32-byte write buffer 
> (thousands of setBuffer cycles);
> ** compressed map (4000 entries -> exercises scratch-buffer doubling; 16-byte 
> chunks; second marshal reusing the writer -> exercises the tmpWriter.reset() 
> branch).
> * Existing DirectMarshallingMessagesTest covers the nested / compressed 
> serialization paths.
> h3. Compatibility / Risk
> Behavior-preserving, no protocol change; safe to backport.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to