PR#4937 ( https://github.com/apache/spark/pull/4937) is a feature to allow for
Spark configuration options (whether on command line, environment variable or a
configuration file) to be specified via a simple expression language.
Such a feature has the following end-user benefits:
- Allows for the flexibility in specifying time intervals or byte quantities in
appropriate and easy to follow units e.g. 1 week rather rather then 604800
seconds
- Allows for the scaling of a configuration option in relation to a system
attributes. e.g.
SPARK_WORKER_CORES = numCores - 1
SPARK_WORKER_MEMORY = physicalMemoryBytes - 1.5 GB
- Gives the ability to scale multiple configuration options together eg:
spark.driver.memory = 0.75 * physicalMemoryBytes
spark.driver.maxResultSize = spark.driver.memory * 0.8
The following functions are currently supported by this PR:
NumCores: Number of cores assigned to the JVM (usually == Physical
machine cores)
PhysicalMemoryBytes: Memory size of hosting machine
JVMTotalMemoryBytes: Current bytes of memory allocated to the JVM
JVMMaxMemoryBytes: Maximum number of bytes of memory available to the JVM
JVMFreeMemoryBytes: maxMemoryBytes - totalMemoryBytes
I was wondering if anybody on the mailing list has any further ideas on other
functions that could be useful to have when specifying spark configuration
options?
Regards,Dale.