> On 25 Sep 2018, at 07:52, tigerquoll <tigerqu...@outlook.com> wrote:
> 
> To give some Kerberos specific examples, The spark-submit args:
> -–conf spark.yarn.keytab=path_to_keytab -–conf
> spark.yarn.principal=princi...@realm.com
> 
> are currently not passed through to the data sources.
> 
> 
> 


I'm not sure why the data sources would need to know the kerberos login 
details, certainly I wouldn't give them the keytab path (or indeed, access to 
it), and as for the principal, UserGroupInformation getCurrentUser() should 
return that, including with support for UGI.doAs() and the ability to issue 
calls as different users from same process. 

I'd also be reluctant to blindly pass on kerberos secrets over the network. 
What does matter is that code interacting with a data source, dest, filesystem, 
etc should be executing it in the context of the intended caller, which UGI 
getCurrentUser() should do.

What does matter is that whatever authentication information is needed to 
authenticate with a data source is passed to it. That's done in the spark 
submit code for yarn by asking the filesystems, hive & hbase; I don't know 
about zookeeper there.

I think what might be good here is to enumerate what datasources are expected 
to need from kerberos (JIRA? google doc), and from any forms of service tokens, 
then see how they could be handled in a way which fits into the existing world 
of Kerberos ticket & Hadoop service token creation on submission or in job 
driver, and handoff to workers which need them

-Steve




---------------------------------------------------------------------
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Reply via email to