Re: [PR] GH-3530: Eagerly release column buffers during row group flush [parquet-java]

via GitHub Tue, 02 Jun 2026 09:30:52 -0700


iemejia commented on code in PR #3571:
URL: https://github.com/apache/parquet-java/pull/3571#discussion_r3342802314



##########
parquet-common/src/main/java/org/apache/parquet/bytes/ConcatenatingByteBufferCollector.java:
##########
@@ -78,6 +83,30 @@ public void writeAllTo(OutputStream out) throws IOException {
     }
   }
 
+  /**
+   * Writes all collected slabs to the given output stream, releasing each 
slab's
+   * {@link ByteBuffer} back to the allocator immediately after it has been 
written.
+   * This progressively frees memory during the write rather than holding all 
slabs
+   * until {@link #close()} is called.
+   *
+   * <p>After this method returns, the collector is empty and {@link #size()} 
returns 0.
+   * Calling {@link #close()} afterwards is safe but has no additional effect.
+   *
+   * @param out the output stream to write to
+   * @throws IOException if an I/O error occurs
+   */
+  public void writeAllToAndRelease(OutputStream out) throws IOException {

Review Comment:
   Good catch — it was indeed only used in tests. I've merged 
`writeAllToAndRelease` into `writeAllTo` so the progressive buffer release is 
now the production implementation (called via 
`ParquetFileWriter.writeColumnChunk` → `bytes.writeAllTo(out)`). The separate 
method is gone.



##########
parquet-common/src/main/java/org/apache/parquet/bytes/ConcatenatingByteBufferCollector.java:
##########
@@ -78,6 +83,30 @@ public void writeAllTo(OutputStream out) throws IOException {
     }
   }
 
+  /**
+   * Writes all collected slabs to the given output stream, releasing each 
slab's
+   * {@link ByteBuffer} back to the allocator immediately after it has been 
written.
+   * This progressively frees memory during the write rather than holding all 
slabs
+   * until {@link #close()} is called.
+   *
+   * <p>After this method returns, the collector is empty and {@link #size()} 
returns 0.
+   * Calling {@link #close()} afterwards is safe but has no additional effect.
+   *
+   * @param out the output stream to write to
+   * @throws IOException if an I/O error occurs
+   */
+  public void writeAllToAndRelease(OutputStream out) throws IOException {
+    WritableByteChannel channel = Channels.newChannel(out);

Review Comment:
   No — closing the channel would close the underlying `OutputStream` which is 
owned by the caller. The channel from `Channels.newChannel(out)` is just a 
stateless wrapper with no independent resources to release. I've added a 
comment explaining this in the updated code.



##########
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/ColumnChunkPageWriteStore.java:
##########
@@ -692,6 +692,11 @@ public void flushToFileWriter(ParquetFileWriter writer) 
throws IOException {
     for (ColumnDescriptor path : schema.getColumns()) {
       ColumnChunkPageWriter pageWriter = writers.get(path);

Review Comment:
   Done — switched to try-with-resources here. This also ensures the column's 
buffers are released even if `writeToFileWriter` throws.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] GH-3530: Eagerly release column buffers during row group flush [parquet-java]

Reply via email to