Kontinuation opened a new issue, #1844:
URL: https://github.com/apache/datafusion-comet/issues/1844

   ### Describe the bug
   
   Reading Delta table with Comet enabled fails with the following error:
   
   ```
   25/05/30 09:53:47 ERROR Executor: Exception in task 3.0 in stage 0.0 (TID 3)
   org.apache.comet.CometNativeException: StructBuilder (Schema { fields: 
[Field { name: "provider", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, Field { name: "options", data_type: 
Map(Field { name: "entries", data_type: Struct([Field { name: "key", data_type: 
Utf8, nullable: false, dict_id: 0, dict_is_ordered: false, metadata: {} }, 
Field { name: "value", data_type: Utf8, nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }]), nullable: false, dict_id: 0, 
dict_is_ordered: false, metadata: {} }, false), nullable: true, dict_id: 0, 
dict_is_ordered: false, metadata: {} }], metadata: {} }) and field_builder with 
index 0 (Utf8) are of unequal lengths: (1 != 0).
           at std::backtrace::Backtrace::create(__internal__:0)
           at comet::errors::init::{{closure}}(__internal__:0)
           at std::panicking::rust_panic_with_hook(__internal__:0)
           at std::panicking::begin_panic_handler::{{closure}}(__internal__:0)
           at std::sys::backtrace::__rust_end_short_backtrace(__internal__:0)
           at _rust_begin_unwind(__internal__:0)
           at core::panicking::panic_fmt(__internal__:0)
           at 
arrow_array::builder::struct_builder::StructBuilder::validate_content::{{closure}}::panic_cold_display(__internal__:0)
           at 
arrow_array::builder::struct_builder::StructBuilder::validate_content(__internal__:0)
           at 
arrow_array::builder::struct_builder::StructBuilder::finish(__internal__:0)
           at <arrow_array::builder::struct_builder::StructBuilder as 
arrow_array::builder::ArrayBuilder>::finish(__internal__:0)
           at 
arrow_array::builder::struct_builder::StructBuilder::finish(__internal__:0)
           at <arrow_array::builder::struct_builder::StructBuilder as 
arrow_array::builder::ArrayBuilder>::finish(__internal__:0)
           at <core::iter::adapters::GenericShunt<I,R> as 
core::iter::traits::iterator::Iterator>::next(__internal__:0)
           at 
comet::execution::shuffle::row::process_sorted_row_partition(__internal__:0)
           at 
comet::execution::jni_api::Java_org_apache_comet_Native_writeSortedFileNative::{{closure}}::{{closure}}(__internal__:0)
           at 
_Java_org_apache_comet_Native_writeSortedFileNative(__internal__:0)
        at org.apache.comet.Native.writeSortedFileNative(Native Method)
        at 
org.apache.spark.sql.comet.execution.shuffle.SpillWriter.doSpilling(SpillWriter.java:187)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometDiskBlockWriter$ArrowIPCWriter.doSpilling(CometDiskBlockWriter.java:401)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometDiskBlockWriter.close(CometDiskBlockWriter.java:304)
        at 
org.apache.spark.sql.comet.execution.shuffle.CometBypassMergeSortShuffleWriter.write(CometBypassMergeSortShuffleWriter.java:244)
        at 
org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:104)
        at 
org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:54)
        at 
org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:166)
        at org.apache.spark.scheduler.Task.run(Task.scala:141)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
        at 
org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:94)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:623)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
        at java.base/java.lang.Thread.run(Thread.java:840)
   ```
   
   ### Steps to reproduce
   
   The following PySpark code reproduces this issue locally.
   
   ```python
   from pyspark.sql import SparkSession
   
   COMET_JAR = "path/to/comet/jar"
   
   spark = (SparkSession.builder
       .master("local[*]")
       .config("spark.jars", COMET_JAR)
       .config("spark.jars.packages", "io.delta:delta-spark_2.12:3.3.1")
       .config("spark.sql.extensions", 
"io.delta.sql.DeltaSparkSessionExtension,org.apache.comet.CometSparkSessionExtensions")
       .config("spark.sql.catalog.spark_catalog", 
"org.apache.spark.sql.delta.catalog.DeltaCatalog")
       .config("spark.driver.extraClassPath", COMET_JAR)
       .config("spark.executor.extraClassPath", COMET_JAR)
       .config("spark.shuffle.manager", 
"org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager")
       .config("spark.comet.enabled", "true")
       .config("spark.comet.exec.shuffle.enabled", "true")
       .config("spark.comet.exec.shuffle.mode", "auto")
       .config("spark.comet.exec.shuffle.fallbackToColumnar", "true")
       .getOrCreate()
   )
   
   spark.range(0, 1000).write.format("delta").save("delta_table")
   spark.read.format("delta").load("delta_table").show() # this line throws the 
aforementioned exception.
   ```
   
   
   ### Expected behavior
   
   Delta table should be read successfully.
   
   ### Additional context
   
   Spark version: Spark 3.5.4
   Comet version: commit 
https://github.com/apache/datafusion-comet/commit/7cf2e9dc9f1cba4f172ea6bdc1a6ac23c859b4d7


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to