StephanEwen commented on a change in pull request #13447: URL: https://github.com/apache/flink/pull/13447#discussion_r492720319
########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/api/writer/RecordWriter.java ########## @@ -109,89 +94,58 @@ } } - protected void emit(T record, int targetChannel) throws IOException, InterruptedException { + protected void emit(T record, int targetSubpartition) throws IOException { checkErroneous(); - serializer.serializeRecord(record); - - // Make sure we don't hold onto the large intermediate serialization buffer for too long - copyFromSerializerToTargetChannel(targetChannel); - } - - /** - * @param targetChannel - * @return <tt>true</tt> if the intermediate serialization buffer should be pruned - */ - protected boolean copyFromSerializerToTargetChannel(int targetChannel) throws IOException, InterruptedException { - // We should reset the initial position of the intermediate serialization buffer before - // copying, so the serialization results can be copied to multiple target buffers. - serializer.reset(); - - boolean pruneTriggered = false; - BufferBuilder bufferBuilder = getBufferBuilder(targetChannel); - SerializationResult result = serializer.copyToBufferBuilder(bufferBuilder); - while (result.isFullBuffer()) { - finishBufferBuilder(bufferBuilder); - - // If this was a full record, we are done. Not breaking out of the loop at this point - // will lead to another buffer request before breaking out (that would not be a - // problem per se, but it can lead to stalls in the pipeline). - if (result.isFullRecord()) { - pruneTriggered = true; - emptyCurrentBufferBuilder(targetChannel); - break; - } - - bufferBuilder = requestNewBufferBuilder(targetChannel); - result = serializer.copyToBufferBuilder(bufferBuilder); - } - checkState(!serializer.hasSerializedData(), "All data should be written at once"); + targetPartition.emitRecord(serializeRecord(serializer, record), targetSubpartition); if (flushAlways) { - flushTargetPartition(targetChannel); + targetPartition.flush(targetSubpartition); } - return pruneTriggered; } public void broadcastEvent(AbstractEvent event) throws IOException { broadcastEvent(event, false); } public void broadcastEvent(AbstractEvent event, boolean isPriorityEvent) throws IOException { - try (BufferConsumer eventBufferConsumer = EventSerializer.toBufferConsumer(event)) { - for (int targetChannel = 0; targetChannel < numberOfChannels; targetChannel++) { - tryFinishCurrentBufferBuilder(targetChannel); - - // Retain the buffer so that it can be recycled by each channel of targetPartition - targetPartition.addBufferConsumer(eventBufferConsumer.copy(), targetChannel, isPriorityEvent); - } + targetPartition.broadcastEvent(event, isPriorityEvent); - if (flushAlways) { - flushAll(); - } + if (flushAlways) { + flushAll(); } } - public void flushAll() { - targetPartition.flushAll(); + @VisibleForTesting + public static ByteBuffer serializeRecord( Review comment: It would be really great if this method were not public. Ideally we can remove this completely, because all tests that use this bypass some crucial logic of this class and may result in meaningless tests. This method is used in three places: - The occurrence in `SingleInputGateTest` can be replaced with emitting a record. - The occurrence in `TestPartitionProducer` could be removed by adjusting `TestProducerSource` to produce `ByteBuffer` instead of `BufferConsumer`, which looks like a nice change that might even simplify things. - If the change for `PartitionTestUtils` could in theory be kept, and the visibility of the method be reduced to package-private. ########## File path: flink-runtime/src/main/java/org/apache/flink/runtime/io/network/partition/BoundedBlockingResultPartition.java ########## @@ -63,6 +63,22 @@ public BoundedBlockingResultPartition( bufferPoolFactory); } + @Override + public void flush(int targetSubpartition) { + finishBroadcastBufferBuilder(); Review comment: Just to double check: We do not want this to be the default behavior in `BufferWritingResultPartition`, because this would finish the partial buffers for streaming/pipelined cases as well, which we don't want. I think this logic may be confusing for future developers. What we could do is the following: - `BufferWritingResultPartition` leaves the `void flush(int)` and `flushAll()` methods abstract. - Instead it offers `protected void flushSubpartition(int partition, boolean finishProducers)` and `protected void flushAllSubpartitions(boolean finishProducers)`. That makes it clear that there is a producer that may or may not be finished, so the caller has to be aware of this behavior. - The `BoundedBlockingResultPartition` then implements `flushAll() { flushAllSubpartitions(true); }` and the `PipelinedResultPartition` implements `flushAll() { flushAllSubpartitions(false); }` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org