I am running Spark Standalone mode and I am finding that when I configure ports 
(i.e. spark.blockManager.port) in both the Spark Master's spark-defaults.conf 
as well as the Spark Worker's, that the Spark Master's port is the one that 
will be used in all the workers. Judging by the code, this seems to be done by 
design. If executor sizes are small, then the 16 ports attempted will be 
exhausted, and executors will fail to start. This is further exacerbated by the 
fact that multiple Spark Workers can exist on the same machine in my particular 
circumstance.

What are the community's thoughts on changing this behavior such that

  1.  The port push down will only happen if the Spark Worker's port 
configuration is not set. This won't solve the problem, but will mitigate it 
and seems to make sense from a user experience point of view.

Similarly, I'd like to prevent environment variable push down as well. Perhaps 
instead of 1. if we can have a configurable switch to turn off push down of 
port configuration and a different one to turn off environment variable push 
down, this will work too.

Please share some of your thoughts 😊

Regards,
Sean

Reply via email to