PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to allow for 
Spark configuration options (whether on command line, environment variable or a 
configuration file) to be specified via a simple expression language.


Such a feature has the following end-user benefits:
- Allows for the flexibility in specifying time intervals or byte quantities in 
appropriate and easy to follow units e.g. 1 week rather rather then 604800 
seconds

- Allows for the scaling of a configuration option in relation to a system 
attributes. e.g.

SPARK_WORKER_CORES = numCores - 1

SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB

- Gives the ability to scale multiple configuration options together eg:

spark.driver.memory = 0.75 * physicalMemoryBytes

spark.driver.maxResultSize = spark.driver.memory * 0.8


The following functions are currently supported by this PR:
NumCores:             Number of cores assigned to the JVM (usually == Physical 
machine cores)
PhysicalMemoryBytes:  Memory size of hosting machine

JVMTotalMemoryBytes:  Current bytes of memory allocated to the JVM

JVMMaxMemoryBytes:    Maximum number of bytes of memory available to the JVM

JVMFreeMemoryBytes:   maxMemoryBytes - totalMemoryBytes


I was wondering if anybody on the mailing list has any further ideas on other 
functions that could be useful to have when specifying spark configuration 
options?
Regards,Dale.
                                          

Reply via email to