[ https://issues.apache.org/jira/browse/FLINK-11775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16815190#comment-16815190 ]
Jingsong Lee commented on FLINK-11775: -------------------------------------- I think my goal is to optimize the serialization of BinaryRow, which currently occurs on two views: 1. AbstractPagedOutputView: In Sort, HashTable, etc. 2. DataOutputSerializer: (Because bytes is saved to byte[] in DataOutputSerializer, it can be directly copied from MemorySegment.) Scenario 1: It happened in RecordWriter and is about to be sent to the network. Scenario 2: In the serialization of RocksDBValueState. My original intention was to optimize the serialization of BinaryRow on both views. The current idea is: Let AbstractPagedOutputView and DataOutputSerializer implement MemorySegmentWritable. In AbstractPagedOutputView, implement write(MemorySegment segment, int off, int len) to use MemorySegment.copyTo(MemorySegment) In DataOutputSerializer, implement write(MemorySegment segment, int off, int len) to use MemorySegment.get(byte[]) Then in BinaryRowSerializer.serialize(), if the outputView isInstanceOf MemorySegmentWritable, call write(MemorySegment), or whether it is serialized using the DataOutputView interface. Thanks [~srichter] and [~StephanEwen] and [~pnowojski] for your advice: 1.let DataOutputView implement MemorySegmentWritable is a bad idea. Not every DataOutputView has the ability to deal directly with MemorySegment. 2.keep MemorySegmentWritable as internal is good. Only our Table can touch it. > Introduce MemorySegmentWritable to let DataOutputView direct copy to internal > bytes > ----------------------------------------------------------------------------------- > > Key: FLINK-11775 > URL: https://issues.apache.org/jira/browse/FLINK-11775 > Project: Flink > Issue Type: New Feature > Components: Runtime / Operators > Reporter: Jingsong Lee > Assignee: Jingsong Lee > Priority: Major > > Blink new binary format is based on MemorySegment. > Introduce MemorySegmentWritable to let DataOutputView direct copy to internal > bytes > {code:java} > /** > * Provides the interface for write(Segment). > */ > public interface MemorySegmentWritable { > /** > * Writes {@code len} bytes from memory segment {@code segment} starting at > offset {@code off}, in order, > * to the output. > * > * @param segment memory segment to copy the bytes from. > * @param off the start offset in the memory segment. > * @param len The number of bytes to copy. > * @throws IOException if an I/O error occurs. > */ > void write(MemorySegment segment, int off, int len) throws IOException; > }{code} > > If we want to write a Memory Segment to DataOutputView, we need to copy bytes > to byte[] and then write it in, which is less effective. > If we let AbstractPagedOutputView have a write(MemorySegment) interface, we > can copy it directly. > We need to ensure this in network serialization, batch operator calculation > serialization, Streaming State serialization to avoid new byte[] and copy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)