Github user andrewor14 commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3082#discussion_r19840424
  
    --- Diff: 
core/src/main/scala/org/apache/spark/ExecutorAllocationManager.scala ---
    @@ -124,6 +125,22 @@ private[spark] class ExecutorAllocationManager(sc: 
SparkContext) extends Logging
           throw new SparkException(s"spark.dynamicAllocation.minExecutors 
($minNumExecutors) must " +
             s"be less than or equal to spark.dynamicAllocation.maxExecutors 
($maxNumExecutors)!")
         }
    +    if (schedulerBacklogTimeout <= 0) {
    +      throw new 
SparkException("spark.dynamicAllocation.schedulerBacklogTimeout must be > 0!")
    +    }
    +    if (sustainedSchedulerBacklogTimeout <= 0) {
    +      throw new SparkException(
    +        "spark.dynamicAllocation.sustainedSchedulerBacklogTimeout must be 
> 0!")
    +    }
    +    if (executorIdleTimeout <= 0) {
    +      throw new 
SparkException("spark.dynamicAllocation.executorIdleTimeout must be > 0!")
    +    }
    +    // Require external shuffle service for dynamic allocation
    --- End diff --
    
    I personally don't see much use of dynamic allocation if you can't both add 
and remove executors. By default we start the cluster at the max number of 
executors anyway, so I see little reason to support one but not the other.
    
    Now, if we do support both, then we have to enable external shuffle service 
because it's basically totally broken if we kill executors without it. Shuffle 
is different from caching in that it's not only an optimization to keep the 
shuffle files but also a necessity for correctness. Since shuffling is such a 
common thing in Spark, I think we should fail the application early before the 
user realizes his/her job is being re-run over and over again. But yes, I think 
we left out the caching story so far. My original design is to add a warning or 
maybe throw an exception if this is enabled and the user tries to cache stuff, 
and I believe we still need to add that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to