Contract for PartitionReader/InputPartition for ColumnarBatch?

Hello spark-dev,

Looking at ColumnarBatch [1] it seems to indicate a single object is meant
to be used for the entire loading process.


Does this imply that Spark assumes the ColumnarBatch and any direct
references to ColumnarBatch (e.g. UTF8Strings) returned by
InputPartitionReader/PartitionReader [2][3] get invalidated after "next()"
is called on the Reader?

Does the same apply for InternalRow?

Does it make sense to update the contracts one way or another (I'm happy to
make a PR).?

Thanks,
Micah

[1]
https://github.com/apache/spark/blob/c341de8b3e1f1d3327bd4ae3b0d2ec048f64d306/sql/catalyst/src/main/java/org/apache/spark/sql/vectorized/ColumnarBatch.java
[2]
https://github.com/apache/spark/blob/branch-2.4/sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartitionReader.java
[3]
https://github.com/apache/spark/blob/a5efbb284e29b1d879490a4ee2c9fa08acec42b0/sql/catalyst/src/main/java/org/apache/spark/sql/connector/read/PartitionReader.java

Contract for PartitionReader/InputPartition for ColumnarBatch?

Reply via email to