Hello, I was trying to understand how impersonation works in hadoop environment. I found a few resources like: About doAs and proxy users: http://dewoods.com/blog/hadoop-kerberos-guide and about tokens: https://hortonworks.com/blog/the-role-of-delegation-tokens-in-apache-hadoop-security/ ..
But I was not able to connect all the dots wrt the full flow of operations. My current understanding is : 1. user does a kinit and executes a end user facing program like beeline, spark-submit etc. 2. The program is app specific and gets service tickets for HDFS 3. It then gets tokens for all the services it may need during the job exeution and saves the tokens in an HDFS directory. 4. The program then connects a job executer(using a service ticket for the job executer??) e.g. yarn with the job info and the token path. 5. The job executor get the tocken and initializes UGI and all communication with HDFS is done using the token and kerberos ticket are not used. Is the above high level understanding correct? (I have more follow up queries.) Can the token mecahnism be skipped and use only kerberos at each layer, if so, any resources will help. My final aim is to write a spark connector with impersonation support for an data storage system which does not use hadoop(tokens) but supports kerberos. Thanks & regards -Sri --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org