mbutrovich commented on issue #4440:
URL: 
https://github.com/apache/datafusion-comet/issues/4440#issuecomment-4662134735

   @parthchandra thanks for digging up your earlier native C2R work, the 
no-speedup result there lines up with what I am seeing and is useful context.
   
   One angle I do not think we have considered yet: the JVM 
`CometColumnarToRowExec` is `CodegenSupport` (a copy of Spark's 
`ColumnarToRowExec` with the SPARK-50235 fix), so when it feeds a 
WholeStageCodegen stage it folds in as the codegen leaf and threads column 
values into the consumer loop with no `UnsafeRow` materialized. 
`CometNativeColumnarToRowExec` is a `ColumnarToRowTransition`, not 
`CodegenSupport`, so it becomes a hard boundary: the WSCG stage starts from an 
`InputAdapter` over its `RDD[InternalRow]`, and `NativeColumnarToRowConverter` 
deep-copies an `UnsafeRow` per row. `EliminateRedundantTransitions` picks 
native by default whenever the schema is supported.
   
   That asymmetry is invisible to the `.noop()` microbench, which has no WSCG 
consumer and lets escape analysis elide the copy. It also does not look like a 
read-speed tradeoff: native output already arrives as a `ColumnarBatch` of 
`CometVector`, and the getters are inlined `Platform` reads 
(`CometPlainVector.getInt` is `Platform.getInt(null, addr + rowId * 4L)`), so 
the JVM codegen path reads through the same accesses, just lazily inside the 
fused loop instead of eagerly per row. The one thing to watch is 
`getUTF8String` returning a zero-copy `fromAddress` view, which is fine for 
non-retaining consumers and for retaining ones that serialize to `UnsafeRow`, 
but not for a consumer that stashes the object reference.
   
   To get the apples-to-apples cluster number we both wanted, I am going to run 
TPC-DS SF1000 with `spark.comet.exec.columnarToRow.native.enabled` true vs 
false and post the per-query results here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to