beliefer opened a new issue, #10649:
URL: https://github.com/apache/incubator-gluten/issues/10649
### Description
Currently, Gluten has many places call method `identifiyBatchType`. But
`identifiyBatchType` always called repeatedly. For example, `getNativeHandle`
calls `identifiyBatchType` repeatedly.
```
public static long getNativeHandle(String backendName, ColumnarBatch
batch) {
if (isZeroColumnBatch(batch)) {
final ColumnarBatchJniWrapper jniWrapper =
ColumnarBatchJniWrapper.create(
Runtimes.contextInstance(backendName,
"ColumnarBatches#getNativeHandle"));
return jniWrapper.getForEmptySchema(batch.numRows());
}
return getIndicatorVector(batch).handle();
}
```
The implementation of `isZeroColumnBatch` show below.
```
static boolean isZeroColumnBatch(ColumnarBatch batch) {
return identifyBatchType(batch) == BatchType.ZERO_COLUMN;
}
```
We can see `isZeroColumnBatch` calls `identifyBatchType` once.
The implementation of `getIndicatorVector ` show below.
```
private static IndicatorVector getIndicatorVector(ColumnarBatch input) {
if (!isLightBatch(input)) {
throw new UnsupportedOperationException("Input batch is not light
batch");
}
return (IndicatorVector) input.column(0);
}
```
`getIndicatorVector ` calls `isLightBatch` here.
The implementation of `isLightBatch ` show below.
```
static boolean isLightBatch(ColumnarBatch batch) {
return identifyBatchType(batch) == BatchType.LIGHT;
}
```
Because `identifyBatchType` has a lot of overhead, I think we should only
call it once.
```
private static BatchType identifyBatchType(ColumnarBatch batch) {
if (batch.numCols() == 0) {
return BatchType.ZERO_COLUMN;
}
final ColumnVector col0 = batch.column(0);
if (col0 instanceof IndicatorVector) {
// it's likely a light batch
for (int i = 1; i < batch.numCols(); i++) {
ColumnVector col = batch.column(i);
if (!(col instanceof PlaceholderVector)) {
throw new IllegalStateException(
"Light batch should consist of one indicator vector "
+ "and (numCols - 1) placeholder vectors");
}
}
return BatchType.LIGHT;
}
// it's likely a heavy batch
for (int i = 0; i < batch.numCols(); i++) {
ColumnVector col = batch.column(i);
if (!(col instanceof ArrowWritableColumnVector)) {
throw new IllegalStateException("Heavy batch should consist of arrow
vectors");
}
}
return BatchType.HEAVY;
}
```
### Gluten version
main branch
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]