l0kr opened a new issue, #1588:
URL: https://github.com/apache/datafusion-comet/issues/1588

   ### Describe the bug
   
   While loading parquet with Spark scan and converting to native then 
collecting dataframe without any transformation throws an exception: 
`java.lang.ClassCastException: class 
org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to class 
org.apache.spark.sql.catalyst.InternalRow`
   
   More detailed stacktrace:
   ```
   Caused by: java.lang.ClassCastException: class 
org.apache.spark.sql.vectorized.ColumnarBatch cannot be cast to class 
org.apache.spark.sql.catalyst.InternalRow 
(org.apache.spark.sql.vectorized.ColumnarBatch and 
org.apache.spark.sql.catalyst.InternalRow are in unnamed module of loader 'app')
        at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
        at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:389)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:891)
        at 
org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:891)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:367)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:92)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
        at org.apache.spark.scheduler.Task.run(Task.scala:139)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:554)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1529)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:557)
   
   ### Steps to reproduce
   
   ```scala
     test("Reproduce error with collect") {
       withSQLConf(
         CometConf.COMET_NATIVE_SCAN_ENABLED.key -> "false",
         CometConf.COMET_CONVERT_FROM_PARQUET_ENABLED.key -> "true"
       ) {
         withTempDir { dir =>
           var df = spark
             .range(10000)
             .selectExpr("id as key", "id % 8 as value")
             .toDF("key", "value")
   
           df.write.mode("overwrite").parquet(dir.toString)
           df = spark.read.parquet(dir.toString)
           df.collect()
         }
       }
     }
   ```
   
   ### Expected behavior
   
   No exception thrown
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to