Hi,


I’m attempting to use Spark on Kubernetes to connect to a Kerberized Hadoop
cluster. While I’m able to successfully connect to the company’s Hive
tables and run queries on them, I’ve only managed to do this on a single
driver pod (with no executors). If I use any executor pods, the process
fails because the executors are not authenticating themselves with the
keytab, returning a SIMPLE authentication error instead. This is surprising
because the executors are using the same image as the driver and should,
therefore, have the keytab and XML config files inside them. The driver is
able to do authenticate itself with the keytab because it’s running the
target JAR, which instructs it to do so. I can see that the executors are
not running processes from the JAR, but are instead running tasks have been
delegated by the driver. Please have a look at my stack overflow question
which contains all the details:



https://stackoverflow.com/questions/54181560/when-running-spark-on-kubernetes-to-access-kerberized-hadoop-cluster-how-do-you





My main references while trying to implement this architecture have been
the following:

   - https://github.com/apache/spark/blob/master/docs/security.md
   -
   
https://www.slideshare.net/Hadoop_Summit/running-secured-spark-job-in-kubernetes-compute-cluster-and-integrating-with-kerberized-hdfs
   -
   
https://www.iteblog.com/sparksummit2018/apache-spark-on-k8s-and-hdfs-security-with-ilan-flonenko-iteblog.pdf



Initially I attempted option 2 in the first link, but it just failed with
the error. I’ve also tried following the second and third link: I attempted
to pass the keytab as a secret in one of the config parameter in the
spark-submit job (as described here:
https://spark.apache.org/docs/latest/running-on-kubernetes.html), but
unfortunately this also returns the same error.



I would be grateful for any advice you can offer.



Thank you,

 Karan

Reply via email to