[GitHub] [flink] StephanEwen commented on a change in pull request #13447: [FLINK-19297][network] Make ResultPartitionWriter record-oriented

GitBox Tue, 22 Sep 2020 06:15:43 -0700


StephanEwen commented on a change in pull request #13447:
URL: https://github.com/apache/flink/pull/13447#discussion_r492720319




##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java
##########
@@ -109,89 +94,58 @@
                }
        }
 
-       protected void emit(T record, int targetChannel) throws IOException, 
InterruptedException {
+       protected void emit(T record, int targetSubpartition) throws 
IOException {
                checkErroneous();
 
-               serializer.serializeRecord(record);
-
-               // Make sure we don't hold onto the large intermediate 
serialization buffer for too long
-               copyFromSerializerToTargetChannel(targetChannel);
-       }
-
-       /**
-        * @param targetChannel
-        * @return <tt>true</tt> if the intermediate serialization buffer 
should be pruned
-        */
-       protected boolean copyFromSerializerToTargetChannel(int targetChannel) 
throws IOException, InterruptedException {
-               // We should reset the initial position of the intermediate 
serialization buffer before
-               // copying, so the serialization results can be copied to 
multiple target buffers.
-               serializer.reset();
-
-               boolean pruneTriggered = false;
-               BufferBuilder bufferBuilder = getBufferBuilder(targetChannel);
-               SerializationResult result = 
serializer.copyToBufferBuilder(bufferBuilder);
-               while (result.isFullBuffer()) {
-                       finishBufferBuilder(bufferBuilder);
-
-                       // If this was a full record, we are done. Not breaking 
out of the loop at this point
-                       // will lead to another buffer request before breaking 
out (that would not be a
-                       // problem per se, but it can lead to stalls in the 
pipeline).
-                       if (result.isFullRecord()) {
-                               pruneTriggered = true;
-                               emptyCurrentBufferBuilder(targetChannel);
-                               break;
-                       }
-
-                       bufferBuilder = requestNewBufferBuilder(targetChannel);
-                       result = serializer.copyToBufferBuilder(bufferBuilder);
-               }
-               checkState(!serializer.hasSerializedData(), "All data should be 
written at once");
+               targetPartition.emitRecord(serializeRecord(serializer, record), 
targetSubpartition);
 
                if (flushAlways) {
-                       flushTargetPartition(targetChannel);
+                       targetPartition.flush(targetSubpartition);
                }
-               return pruneTriggered;
        }
 
        public void broadcastEvent(AbstractEvent event) throws IOException {
                broadcastEvent(event, false);
        }
 
        public void broadcastEvent(AbstractEvent event, boolean 
isPriorityEvent) throws IOException {
-               try (BufferConsumer eventBufferConsumer = 
EventSerializer.toBufferConsumer(event)) {
-                       for (int targetChannel = 0; targetChannel < 
numberOfChannels; targetChannel++) {
-                               tryFinishCurrentBufferBuilder(targetChannel);
-
-                               // Retain the buffer so that it can be recycled 
by each channel of targetPartition
-                               
targetPartition.addBufferConsumer(eventBufferConsumer.copy(), targetChannel, 
isPriorityEvent);
-                       }
+               targetPartition.broadcastEvent(event, isPriorityEvent);
 
-                       if (flushAlways) {
-                               flushAll();
-                       }
+               if (flushAlways) {
+                       flushAll();
                }
        }
 
-       public void flushAll() {
-               targetPartition.flushAll();
+       @VisibleForTesting
+       public static ByteBuffer serializeRecord(

Review comment:
       It would be really great if this method were not public. Ideally we can 
remove this completely, because all tests that use this bypass some crucial 
logic of this class and may result in meaningless tests.
   
   This method is used in three places:
     - The occurrence in `SingleInputGateTest` can be replaced with emitting a 
record.
     - The occurrence in `TestPartitionProducer` could be removed by adjusting 
`TestProducerSource` to produce `ByteBuffer` instead of `BufferConsumer`, which 
looks like a nice change that might even simplify things.
     - If the change for `PartitionTestUtils` could in theory be kept, and the 
visibility of the method be reduced to package-private.

##########
File path: 
flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/BoundedBlockingResultPartition.java
##########
@@ -63,6 +63,22 @@ public BoundedBlockingResultPartition(
                        bufferPoolFactory);
        }
 
+       @Override
+       public void flush(int targetSubpartition) {
+               finishBroadcastBufferBuilder();

Review comment:
       Just to double check: We do not want this to be the default behavior in 
`BufferWritingResultPartition`, because this would finish the partial buffers 
for streaming/pipelined cases as well, which we don't want.
   
   I think this logic may be confusing for future developers. What we could do 
is the following:
     - `BufferWritingResultPartition` leaves the `void flush(int)` and 
`flushAll()` methods abstract.
     - Instead it offers `protected void flushSubpartition(int partition, 
boolean finishProducers)` and `protected void flushAllSubpartitions(boolean 
finishProducers)`. That makes it clear that there is a producer that may or may 
not be finished, so the caller has to be aware of this behavior.
     - The `BoundedBlockingResultPartition` then implements `flushAll() { 
flushAllSubpartitions(true); }` and the `PipelinedResultPartition` implements 
`flushAll() { flushAllSubpartitions(false); }`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] StephanEwen commented on a change in pull request #13447: [FLINK-19297][network] Make ResultPartitionWriter record-oriented

Reply via email to