> On 25 Sep 2018, at 07:52, tigerquoll <tigerqu...@outlook.com> wrote: > > To give some Kerberos specific examples, The spark-submit args: > -–conf spark.yarn.keytab=path_to_keytab -–conf > spark.yarn.principal=princi...@realm.com > > are currently not passed through to the data sources. > > >
I'm not sure why the data sources would need to know the kerberos login details, certainly I wouldn't give them the keytab path (or indeed, access to it), and as for the principal, UserGroupInformation getCurrentUser() should return that, including with support for UGI.doAs() and the ability to issue calls as different users from same process. I'd also be reluctant to blindly pass on kerberos secrets over the network. What does matter is that code interacting with a data source, dest, filesystem, etc should be executing it in the context of the intended caller, which UGI getCurrentUser() should do. What does matter is that whatever authentication information is needed to authenticate with a data source is passed to it. That's done in the spark submit code for yarn by asking the filesystems, hive & hbase; I don't know about zookeeper there. I think what might be good here is to enumerate what datasources are expected to need from kerberos (JIRA? google doc), and from any forms of service tokens, then see how they could be handled in a way which fits into the existing world of Kerberos ticket & Hadoop service token creation on submission or in job driver, and handoff to workers which need them -Steve --------------------------------------------------------------------- To unsubscribe e-mail: dev-unsubscr...@spark.apache.org