neyama opened a new issue, #1045:
URL: https://github.com/apache/datafusion-comet/issues/1045

   ### Describe the bug
   
   NoSuchElementException occurs in 
org.apache.spark.CometDriverPlugin.init(Plugins.scala:56).
   
   ```console
   java.util.NoSuchElementException: spark.executor.memory
           at org.apache.spark.SparkConf.$anonfun$get$1(SparkConf.scala:245)
           at scala.Option.getOrElse(Option.scala:189)
           at org.apache.spark.SparkConf.get(SparkConf.scala:245)
           at 
org.apache.spark.SparkConf.$anonfun$getSizeAsMb$1(SparkConf.scala:355)
           at 
scala.runtime.java8.JFunction0$mcJ$sp.apply(JFunction0$mcJ$sp.java:23)
           at org.apache.spark.SparkConf.catchIllegalValue(SparkConf.scala:482)
           at org.apache.spark.SparkConf.getSizeAsMb(SparkConf.scala:355)
           at org.apache.spark.CometDriverPlugin.init(Plugins.scala:56)
           at 
org.apache.spark.internal.plugin.DriverPluginContainer.$anonfun$driverPlugins$1(PluginContainer.scala:53)
           at 
scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
           at 
scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
           at 
scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
           at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
           at 
scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
           at 
scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)
           at 
scala.collection.AbstractTraversable.flatMap(Traversable.scala:108)
           at 
org.apache.spark.internal.plugin.DriverPluginContainer.<init>(PluginContainer.scala:46)
           at 
org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:210)
           at 
org.apache.spark.internal.plugin.PluginContainer$.apply(PluginContainer.scala:193)
           at org.apache.spark.SparkContext.<init>(SparkContext.scala:574)
           at 
org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
           at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native
 Method)
           at 
java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
           at 
java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
           at 
java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
           at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
           at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
           at py4j.Gateway.invoke(Gateway.java:238)
           at 
py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
           at 
py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
           at 
py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
           at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
           at java.base/java.lang.Thread.run(Thread.java:829)
   ```
   
   ### Steps to reproduce
   
   #### Prerequisites
   + Ubuntu 20.04.6 LTS
   + openjdk-11-jdk-headless
   + Apache Spark 3.5.3 (Pre-built for Apache Hadoop 3.3 and later)
   
   #### How to reproduce the bug
   Build `datafusion-comet` from scratch.
   
   ```console
   $ git clone https://github.com/apache/datafusion-comet.git
   $ cd datafusion-comet
   $ make release PROFILES="-Pspark-3.5"
   ```
   
   Use `datafusion-benchmarks` with tpch (scale factor=1) without specifying 
`--conf spark.executor.memory=<size>` for producing the bug.
   ```console
   $ git clone https://github.com/apache/datafusion-benchmarks.git
   $ cd datafusion-benchmarks/runners/datafusion-comet
   $ export 
COMET_JAR=<path-to>/datafusion-comet/spark/target/comet-spark-spark3.5_2.12-0.4.0-SNAPSHOT.jar
   $ $SPARK_HOME/bin/spark-submit     \
       --master "local[*]" \
        --jars $COMET_JAR \
        --conf spark.driver.extraClassPath=$COMET_JAR \
        --conf spark.executor.extraClassPath=$COMET_JAR \
        --conf spark.plugins=org.apache.spark.CometPlugin \
        --conf spark.comet.enabled=true \
        --conf spark.comet.exec.enabled=true \
        --conf spark.comet.cast.allowIncompatible=true \
        --conf spark.comet.exec.shuffle.enabled=true \
        --conf spark.comet.exec.shuffle.mode=auto \
       --conf 
spark.shuffle.manager=org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager
 \
        tpcbench.py \
        --benchmark tpch \
        --data <path-to-tpch-sf1-data> \
       --queries ../../tpch/queries/ \
       --output .
   ```
   
   When I specify `--conf spark.executor.memory=<size>` explicitly, the error 
does not occur.
   
   I guess the default value of `spark.executor.memory` is not defined when 
`org.apache.spark.CometDriverPlugin.init(Plugins.scala:56)` is called.
   So, we should provide a default value for `spark.executor.memory` when 
calling `org.apache.spark.SparkConf.getSizeAsMb(SparkConf.scala:355)`, or check 
whether or not the `spark.executor.memory` is contained in the conf, then only 
when it is set, we call 
`org.apache.spark.SparkConf.getSizeAsMb(SparkConf.scala:355)`.
   
   ### Expected behavior
   
   TPCH completes successfully.
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to