bmorck opened a new issue, #1092:
URL: https://github.com/apache/datafusion-comet/issues/1092

   ### Describe the bug
   
   I'm working on some internal benchmarks using Comet with Spark 3.3 and 
Iceberg. To support iceberg, we are including the config, which inserts the 
`CometSparkToColumnar` nodes following the BatchScans
   
   > "spark.comet.convert.parquet.enabled" = "true"
       "spark.comet.sparkToColumnar.supportedOperatorList" = "BatchScan"
   
   In the following example, we see that there is a missing `ColumnarToRow` 
preceding the `Project (19)` operator. This results in the query failing. After 
analyzing the query optimization, I've found that the 
`EliminateRedundantTransitions` rule, removed a `CometSparkToColumnar` and 
subsequent `ColumnarToRow` following the  `BatchScan (20)` operator, due to the 
subsequent `Filter (21)` operator requiring row format. 
   
   `  CometSort (27)
      +- CometColumnarExchange (26)
         +- BroadcastNestedLoopJoin Cross BuildRight (25)
            :- Project (19)
            :  +- CometSortMergeJoin (18)
            :     :- CometSort (6)
            :     :  +- ShuffleQueryStage (5)
            :     :     +- CometExchange (4)
            :     :        +- CometFilter (3)
            :     :           +- CometSparkToColumnar (2)
            :     :              +- BatchScan (1)
            :     +- CometSort (17)
            :        +- CometExchange (16)
            :           +- CometFilter (15)
            :              +- CometHashAggregate (14)
            :                 +- ShuffleQueryStage (13)
            :                    +- CometExchange (12)
            :                       +- CometHashAggregate (11)
            :                          +- CometProject (10)
            :                             +- CometFilter (9)
            :                                +- CometSparkToColumnar (8)
            :                                   +- BatchScan (7)
            +- BroadcastQueryStage (24)
               +- BroadcastExchange (23)
                  +- * Project (22)
                     +- * Filter (21)
                        +- BatchScan (20)`
                        
                        
   I've modified the filter to be able to be converted to native and we see the 
query inserts the appropriate `ColumnarToRow` transitions, as shown below. It's 
unclear if this is a bug in Sparks `ApplyColumnarRulesAndInsertTransitions` 
rule or if this is unique to Comet, but it seems like incorrect behavior when 
using the `CometSparkToColumnarNode`
   
   `   ColumnarToRow (34)
      +- CometSort (33)
         +- AQEShuffleRead (32)
            +- ShuffleQueryStage (31), Statistics(sizeInBytes=4.2 KiB, 
rowCount=36)
               +- CometColumnarExchange (30)
                  +- BroadcastNestedLoopJoin Cross BuildRight (29)
                     :- Project (21)
                     :  +- ColumnarToRow (20)
                     :     +- CometBroadcastHashJoin (19)
                     :        :- BroadcastQueryStage (8), 
Statistics(sizeInBytes=319.7 KiB, rowCount=583)
                     :        :  +- CometBroadcastExchange (7)
                     :        :     +- AQEShuffleRead (6)
                     :        :        +- ShuffleQueryStage (5), 
Statistics(sizeInBytes=409.3 KiB, rowCount=583)
                     :        :           +- CometExchange (4)
                     :        :              +- CometFilter (3)
                     :        :                 +- CometSparkToColumnar (2)
                     :        :                    +- BatchScan (1)
                     :        +- CometFilter (18)
                     :           +- CometHashAggregate (17)
                     :              +- AQEShuffleRead (16)
                     :                 +- ShuffleQueryStage (15), 
Statistics(sizeInBytes=2040.0 B, rowCount=10)
                     :                    +- CometExchange (14)
                     :                       +- CometHashAggregate (13)
                     :                          +- CometProject (12)
                     :                             +- CometFilter (11)
                     :                                +- CometSparkToColumnar 
(10)
                     :                                   +- BatchScan (9)
                     +- BroadcastQueryStage (28), Statistics(sizeInBytes=864.0 
B, rowCount=36)
                        +- BroadcastExchange (27)
                           +- ColumnarToRow (26)
                              +- CometProject (25)
                                 +- CometFilter (24)
                                    +- CometSparkToColumnar (23)
                                       +- BatchScan (22)
    `
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to