BjarkeTornager opened a new issue, #1026:
URL: https://github.com/apache/datafusion-comet/issues/1026

   ### Describe the bug
   
   I have followed the [building from source 
guide](https://datafusion.apache.org/comet/user-guide/source.html) since I am 
on macOS. Only difference is that I ran the build with version 3.3: `make 
release-nogit PROFILES="-Pspark-3.3"`.
   
   With the produced jar from the build I can run Spark with Comet fine in the 
terminal like this:
   ```
   export 
COMET_JAR=apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar
   
   SPARK_HOME/bin/spark-shell \
       --jars $COMET_JAR \
       --conf spark.driver.extraClassPath=$COMET_JAR \
       --conf spark.executor.extraClassPath=$COMET_JAR \
       --conf spark.plugins=org.apache.spark.CometPlugin \
       --conf 
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
 \
       --conf spark.comet.explainFallback.enabled=true \
       --conf spark.memory.offHeap.enabled=true \
       --conf spark.memory.offHeap.size=16g
   ```
   
   However, when adding comet spark to my spark config options in my own 
project like this:
   ```
   "spark.jars": 
"apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
   "spark.driver.extraClassPath": 
"apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
   "spark.executor.extraClassPath": 
"apache-datafusion-comet-0.3.0/spark/target/comet-spark-spark3.3_2.12-0.3.0.jar",
   "spark.plugins": "org.apache.spark.CometPlugin",
   "spark.shuffle.manager": 
"org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager",
   "spark.comet.explainFallback.enabled": "true",
   "spark.memory.offHeap.enabled": "true",
   "spark.memory.offHeap.size": "16g",
   ```
   
   And running a spark test using pytest, which always succeeds when not adding 
the comet spark configurations mentioned above, I get the following exception:
   ```
   ---------------------------------------------------------------------------- 
Captured stdout call 
-----------------------------------------------------------------------------
   24/10/20 07:25:32 WARN CometSparkSessionExtensions$CometExecRule: Comet 
cannot execute some parts of this plan natively (set 
spark.comet.explainFallback.enabled=false to disable this logging):
   HashAggregate
   +-  Exchange [COMET: Exchange is not native because the following children 
are not native (HashAggregate)]
      +-  HashAggregate [COMET: HashAggregate is not native because the 
following children are not native (Project)]
         +-  Project [COMET: Project is not native because the following 
children are not native (BroadcastHashJoin)]
            +-  BroadcastHashJoin [COMET: BroadcastHashJoin is not native 
because the following children are not native (Scan ExistingRDD, 
BroadcastExchange)]
               :-  Scan ExistingRDD [COMET: Scan ExistingRDD is not supported]
               +- BroadcastExchange
                  +- CometProject
                     +- CometFilter
                        +- CometScanWrapper
   
   24/10/20 07:25:32 ERROR Executor: Exception in task 0.0 in stage 1.0 (TID 1)
   java.lang.ExceptionInInitializerError
        at org.apache.comet.package$.<init>(package.scala:90)
        at org.apache.comet.package$.<clinit>(package.scala)
        at org.apache.comet.vector.NativeUtil.<init>(NativeUtil.scala:48)
        at org.apache.comet.CometExecIterator.<init>(CometExecIterator.scala:52)
        at 
org.apache.spark.sql.comet.CometNativeExec.createCometExecIter$1(operators.scala:223)
        at 
org.apache.spark.sql.comet.CometNativeExec.$anonfun$doExecuteColumnar$6(operators.scala:298)
        at 
org.apache.spark.sql.comet.ZippedPartitionsRDD.compute(ZippedPartitionsRDD.scala:43)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at 
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:365)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:329)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:136)
        at 
org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:548)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1504)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:551)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
   Caused by: org.apache.comet.CometRuntimeException: Could not find 
comet-git-info.properties
        at org.apache.comet.package$CometBuildInfo$.<init>(package.scala:57)
        at org.apache.comet.package$CometBuildInfo$.<clinit>(package.scala)
        ... 23 more
   ```
   
   Searching in datafusion-comet source code it looks like the error comes from 
[here](https://github.com/apache/datafusion-comet/blob/4033687378feb113ecd736c1edc4c6fb35f31eb6/common/src/main/scala/org/apache/comet/package.scala#L57).
   
   Details of environment:
   - macOS Sonoma version 14.6
   - Spark 3.3.4 using pyspark
   - Scala version 2.12
   
   ### Steps to reproduce
   
   _No response_
   
   ### Expected behavior
   
   _No response_
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to