Hello,

Spark Operator is a tool that can deploy/scale and help with monitoring of
Spark clusters on Kubernetes. It follows the operator pattern [1]
introduced by CoreOS so it watches for changes in custom resources
representing the desired state of the clusters and does the steps to
achieve this state in the Kubernetes by using the K8s client. It’s written
in Java and there is an overlap with the spark dependencies (logging, k8s
client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
also metadata that allows it to deploy smoothly using the operatorhub.io
[2]. For a very basic info, check the readme on the project page including
the gif :) Other unique feature to this operator is the ability (it’s
optional) to compile itself to a native image using GraalVM compiler to be
able to start fast and have a very low memory footprint.

We would like to contribute this project to Spark’s code base. It can’t be
distributed as a spark package, because it’s not a library that can be used
from Spark environment. So if you are interested, the directory under
resource-managers/kubernetes/spark-operator/ could be a suitable
destination.

The current repository is radanalytics/spark-operator [2] on GitHub and it
contains also a test suite [3] that verifies if the operator can work well
on K8s (using minikube) and also on OpenShift. I am not sure how to
transfer those tests in case you would be interested in those as well.

I’ve already opened the PR [5], but it got closed, so I am opening the
discussion here first. The PR contained old package names with our
organisation called radanalytics.io but we are willing to change that to
anything that will be more aligned with the existing Spark conventions,
same holds for the license headers in all the source files.

jk


[1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/

[2]: https://operatorhub.io/operator/radanalytics-spark

[3]: https://github.com/radanalyticsio/spark-operator

[4]: https://travis-ci.org/radanalyticsio/spark-operator
[5]: https://github.com/apache/spark/pull/26075

Reply via email to