Hi all,

Is anyone here interested in adding the ability to request GPUs to Spark's
client (i.e, spark-submit)? As of now, Yarn 3.0's resource manager server
has the ability to schedule GPUs as resources via cgroups, but the Spark
client lacks an ability to request these.

The ability to guarantee GPU resources would be practically useful for my
organization. Right now, the only way to do that is to request the entire
memory (or all CPU's) on a node, which is very kludgey and wastes
resources, especially if your node has more than 1 GPU and your code was
written such that an executor can use only one GPU at a time.

I'm just not sure of a good way to make use of libraries like Databricks' Deep
Learning pipelines <https://github.com/databricks/spark-deep-learning> for
GPU-heavy computation otherwise, unless you are luckily in an organization
which is able to virtualize computer nodes such that each node will have
only one GPU. Of course, I realize that many Databricks customers are using
Azure or AWS, which allow you to do this facilely. Is this what people
normally do in industry?

This is something I am interested in working on, unless others out there
have advice on why this is a bad idea.

Unfortunately, I am not familiar enough with Mesos and Kubernetes right now
to know how they schedule gpu resources and whether adding support for
requesting GPU's from them to the spark-submit client would be simple.

Daniel

-- 
Daniel Galvez
http://danielgalvez.me
https://github.com/galv

Reply via email to