iemejia commented on code in PR #3565:
URL: https://github.com/apache/parquet-java/pull/3565#discussion_r3343039837
##########
parquet-column/src/main/java/org/apache/parquet/column/values/plain/PlainValuesReader.java:
##########
@@ -19,120 +19,91 @@
package org.apache.parquet.column.values.plain;
import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
import org.apache.parquet.bytes.ByteBufferInputStream;
-import org.apache.parquet.bytes.LittleEndianDataInputStream;
import org.apache.parquet.column.values.ValuesReader;
-import org.apache.parquet.io.ParquetDecodingException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
/**
- * Plain encoding for float, double, int, long
+ * Plain encoding for float, double, int, long.
+ *
+ * <p>Reads directly from a {@link ByteBuffer} with {@link
ByteOrder#LITTLE_ENDIAN} byte order,
+ * bypassing the deprecated {@code LittleEndianDataInputStream} wrapper to
avoid per-value virtual
+ * dispatch overhead. The underlying page data is obtained as a single
contiguous {@link ByteBuffer}
+ * via {@link ByteBufferInputStream#slice(int)}.
*/
public abstract class PlainValuesReader extends ValuesReader {
private static final Logger LOG =
LoggerFactory.getLogger(PlainValuesReader.class);
- protected LittleEndianDataInputStream in;
Review Comment:
Good point. The old `protected LittleEndianDataInputStream in` field is now
`protected ByteBuffer buffer` — the type change is binary-incompatible
regardless, so a deprecation cycle wouldn't help external subclasses (they'd
get a compile error either way). I searched the project and only internal
subclasses (the 4 inner classes) access this field. I think this is acceptable
given this class was never annotated `@Public` and the field type change makes
deprecation impractical. WDYT?
##########
parquet-common/src/main/java/org/apache/parquet/bytes/BytesInput.java:
##########
@@ -643,6 +673,20 @@ public ByteBuffer toByteBuffer() throws IOException {
return java.nio.ByteBuffer.wrap(in, offset, length);
}
+ /**
+ * Zero-copy override: returns the backing array directly when fully used,
+ * skipping the base-class BAOS allocation + copy on every decompressor
call.
+ * Returning the mutable array is safe — the base class already exposes a
+ * mutable {@code BAOS.getBuf()}.
+ */
+ @Override
+ public byte[] toByteArray() {
Review Comment:
Addressed in f0bdac633. The base-class `toInputStream()` now tries
`getInternalByteBuffer()` first (zero-copy fast path) before falling back to
the deprecated `toByteBuffer()`. Also added `getInternalByteBuffer()` and
`toInputStream()` overrides to `ByteArrayBytesInput` so the byte-array-backed
path is zero-copy too. Added `@SuppressWarnings("deprecation")` on the
intentional overrides.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]