mbutrovich commented on issue #4440: URL: https://github.com/apache/datafusion-comet/issues/4440#issuecomment-4662134735
@parthchandra thanks for digging up your earlier native C2R work, the no-speedup result there lines up with what I am seeing and is useful context. One angle I do not think we have considered yet: the JVM `CometColumnarToRowExec` is `CodegenSupport` (a copy of Spark's `ColumnarToRowExec` with the SPARK-50235 fix), so when it feeds a WholeStageCodegen stage it folds in as the codegen leaf and threads column values into the consumer loop with no `UnsafeRow` materialized. `CometNativeColumnarToRowExec` is a `ColumnarToRowTransition`, not `CodegenSupport`, so it becomes a hard boundary: the WSCG stage starts from an `InputAdapter` over its `RDD[InternalRow]`, and `NativeColumnarToRowConverter` deep-copies an `UnsafeRow` per row. `EliminateRedundantTransitions` picks native by default whenever the schema is supported. That asymmetry is invisible to the `.noop()` microbench, which has no WSCG consumer and lets escape analysis elide the copy. It also does not look like a read-speed tradeoff: native output already arrives as a `ColumnarBatch` of `CometVector`, and the getters are inlined `Platform` reads (`CometPlainVector.getInt` is `Platform.getInt(null, addr + rowId * 4L)`), so the JVM codegen path reads through the same accesses, just lazily inside the fused loop instead of eagerly per row. The one thing to watch is `getUTF8String` returning a zero-copy `fromAddress` view, which is fine for non-retaining consumers and for retaining ones that serialize to `UnsafeRow`, but not for a consumer that stashes the object reference. To get the apples-to-apples cluster number we both wanted, I am going to run TPC-DS SF1000 with `spark.comet.exec.columnarToRow.native.enabled` true vs false and post the per-query results here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
