Hi Steve, I think that passing a kerberos keytab around is one of those bad ideas that is entirely appropriate to re-question every single time you come across it. It has been used already in spark when interacting with Kerberos systems that do not support delegation tokens. Any such system will eventually stop talking to Spark once the passed Kerberos tickets expire and are unable to be renewed.
It is one of those "best bad idea we have" type situations that has arisen, been discussed to death, and finally, grudgingly, an interim-only solution settled on as passing the keytab to the worker to renew Kerberos tickets. A long-time notable offender in this area is secure Kafka. Thankfully Kafka delegation tokens are soon to be supported in spark, removing the need to pass keytabs around when interacting with Kafka. This particular thread could probably be better renamed as Generic Datasource v2 support for Kerberos configuration - I would like to divert from conversation on alternate architectures that could handle a lack of delegation tickets (it is a worthwhile conversation, but a long and involved one that will distract from this particular narrowly defined topic), and focus just on configuration. information. A very quick look through various client code has identified at least the following configuration information that potentially could be of use to a datasource that uses Kerberos. * krb5ConfPath * kerberos debugging flags * spark.security.credentials.${service}.enabled * JAAS config * ZKServerPrincipal ?? It is entirely feasible that each datasource may require its own unique Kerberos configuration (e.g. You are pulling from a external datasource that has a different KDC then the yarn cluster you are running on). -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org