BjarkeTornager opened a new issue, #1026: URL: https://github.com/apache/datafusion-comet/issues/1026
### Describe the bug I have followed the [building from source guide](https://datafusion.apache.org/comet/user-guide/source.html) since I am on macOS. Only difference is that I ran the build with version 3.3: `make release-nogit PROFILES="-Pspark-3.3"`. With the produced jar from the build I can run Spark with Comet fine in the terminal like this: ``` export COMET_JAR=apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar SPARK_HOME/bin/spark-shell \ --jars $COMET_JAR \ --conf spark.driver.extraClassPath=$COMET_JAR \ --conf spark.executor.extraClassPath=$COMET_JAR \ --conf spark.plugins=org.apache.spark.CometPlugin \ --conf spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager \ --conf spark.comet.explainFallback.enabled=true \ --conf spark.memory.offHeap.enabled=true \ --conf spark.memory.offHeap.size=16g ``` However, when adding comet spark to my spark config options in my own project like this: ``` "spark.jars": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar", "spark.driver.extraClassPath": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar", "spark.executor.extraClassPath": "apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar", "spark.plugins": "org.apache.spark.CometPlugin", "spark.shuffle.manager": "org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager", "spark.comet.explainFallback.enabled": "true", "spark.memory.offHeap.enabled": "true", "spark.memory.offHeap.size": "16g", ``` And running a spark test using pytest, which always succeeds when not adding the comet spark configurations mentioned above, I get the following exception: ``` ---------------------------------------------------------------------------- Captured stdout call ----------------------------------------------------------------------------- 24/10/20 07:25:32 WARN CometSparkSessionExtensions$CometExecRule: Comet cannot execute some parts of this plan natively (set spark.comet.explainFallback.enabled=false to disable this logging): HashAggregate +- Exchange [COMET: Exchange is not native because the following children are not native (HashAggregate)] +- HashAggregate [COMET: HashAggregate is not native because the following children are not native (Project)] +- Project [COMET: Project is not native because the following children are not native (BroadcastHashJoin)] +- BroadcastHashJoin [COMET: BroadcastHashJoin is not native because the following children are not native (Scan ExistingRDD, BroadcastExchange)] :- Scan ExistingRDD [COMET: Scan ExistingRDD is not supported] +- BroadcastExchange +- CometProject +- CometFilter +- CometScanWrapper 24/10/20 07:25:32 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1) java.lang.ExceptionInInitializerError at org.apache.comet.package$.<init>(package.scala:90) at org.apache.comet.package$.<clinit>(package.scala) at org.apache.comet.vector.NativeUtil.<init>(NativeUtil.scala:48) at org.apache.comet.CometExecIterator.<init>(CometExecIterator.scala:52) at org.apache.spark.sql.comet.CometNativeExec.createCometExecIter$1(operators.scala:223) at org.apache.spark.sql.comet.CometNativeExec.$anonfun$doExecuteColumnar$6(operators.scala:298) at org.apache.spark.sql.comet.ZippedPartitionsRDD.compute(ZippedPartitionsRDD.scala:43) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365) at org.apache.spark.rdd.RDD.iterator(RDD.scala:329) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90) at org.apache.spark.scheduler.Task.run(Task.scala:136) at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548) at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) at java.base/java.lang.Thread.run(Thread.java:829) Caused by: org.apache.comet.CometRuntimeException: Could not find comet-git-info.properties at org.apache.comet.package$CometBuildInfo$.<init>(package.scala:57) at org.apache.comet.package$CometBuildInfo$.<clinit>(package.scala) ... 23 more ``` Searching in datafusion-comet source code it looks like the error comes from [here](https://github.com/apache/datafusion-comet/blob/4033687378feb113ecd736c1edc4c6fb35f31eb6/common/src/main/scala/org/apache/comet/package.scala#L57). Details of environment: - macOS Sonoma version 14.6 - Spark 3.3.4 using pyspark - Scala version 2.12 ### Steps to reproduce _No response_ ### Expected behavior _No response_ ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
