andygrove commented on PR #4519:
URL: 
https://github.com/apache/datafusion-comet/pull/4519#issuecomment-4835067711

   A few smaller follow-ups after the latest round of changes:
   
   1. **Config doc wording.** `spark.comet.exec.transitionRevert.enabled` says 
the threshold counts "C2R and R2C transition pairs", but `countTransitions` 
only counts `ColumnarToRowTransition` nodes. The `maxTransitions` doc also 
still talks about full round-trips. Could we reword both so they match what the 
code counts? Something like "the number of columnar-to-row (C2R) transitions in 
a stage" would be clearer.
   
   2. **`CometSparkSessionExtensions` docstring is stale.** The pipeline 
diagrams at lines 54 and 77 still only list `EliminateRedundantTransitions` 
under `postColumnarTransitions`. Worth adding 
`RevertNativeForTransitionHeavyStages` there so the documented flow matches 
`CometExecColumnar`.
   
   3. **Rule ordering vs `EliminateRedundantTransitions`.** 
`EliminateRedundantTransitions` runs before this new rule. When the revert 
wraps a row-based subtree in `RowToColumnarExec` to feed a 
`CometShuffleExchangeExec` with `CometColumnarShuffle`, that R2C is exactly 
what `EliminateRedundantTransitions.scala:103-110` would have stripped, but it 
has already run. Not a correctness bug, just a missed cleanup. Could either 
swap the order or run a small cleanup after revert.
   
   4. **Dead guard in `insertTransitions`.** Line 170's 
`!node.isInstanceOf[QueryStageExec]` check is unreachable because 
`transformStageUp` already skips stage-boundary children. Safe to drop.
   
   5. **`lazy val` for conf reads.** `enabled` and `maxTransitions` are safe 
today only because `CometExecColumnar.postColumnarTransitions` is a `def` that 
rebuilds the rule per access. If anyone changes that to a `val`, the rule would 
freeze conf at first read. A `private def` would be more defensive.
   
   Overall the direction looks good. The stage-boundary fix and the new 
regression tests address my earlier concern, and defaulting to `false` keeps 
this low-risk.
   
   Disclosure: I drafted this comment with help from AI (Claude Code).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to