LukMRVC opened a new issue, #1460:
URL: https://github.com/apache/datafusion-comet/issues/1460

   ### Describe the bug
   
   When initialization Spark session in unified mode, Comet overrides 
`spark.executor.memoryOverhead` to over 100GB memory requirements.
   
   
https://github.com/apache/datafusion-comet/blob/928e1a2bbea6b19bdcefd620395fd7ffffc4773c/spark/src/main/scala/org/apache/spark/Plugins.scala#L65-L72
   
   First the `if` clause seems to be reversed. Secondly, 
`CometSparkSessionExtensions.getCometShuffleMemorySize` returns memory in 
bytes, which is added to memory in MB. This leads overriding memory allocation 
to absurd sizes.
   
   ### Steps to reproduce
   
   Create spark session and see `spark.executor.memoryOverhead`.
   
   ```scala
   import org.apache.spark.sql.SparkSession
   
   val spark = SparkSession.builder()
       .appName("sample") 
       .config("spark.executor.memory", "2G")
       .config("spark.executor.memoryOverhead", "5G")
       .config("spark.driver.memory", "2G") 
       .config("spark.driver.memoryOverhead", "2G")
       
       .config("spark.comet.exec.enabled", "true")
       .config("spark.shuffle.manager", 
"org.apache.spark.sql.comet.execution.shuffle.CometShuffleManager")
       .config("spark.comet.shuffle.enabled", "true")
       .config("spark.comet.explainFallback.enabled", "true")
       .config("spark.memory.offHeap.enabled", "true")
       .config("spark.memory.offHeap.size", "3G")
       .config("spark.comet.memory.overhead.factor", "0.2")    
       .config("spark.comet.convert.json.enabled", "true")
       
       .config("spark.plugins", "org.apache.spark.CometPlugin")
       .config("spark.comet.enabled", "true")
       .getOrCreate()
   ```
   
   ### Expected behavior
   
   Overriding executor memory to reasonable amounts set by 
`spark.comet.memoryOverhead` or computed in MB units
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to