On 4 Nov 2016, at 01:37, Marcelo Vanzin <van...@cloudera.com<mailto:van...@cloudera.com>> wrote:
On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth <toth.zsolt....@gmail.com<mailto:toth.zsolt....@gmail.com>> wrote: What is the purpose of the delegation token renewal (the one that is done automatically by Hadoop libraries, after 1 day by default)? It seems that it always happens (every day) until the token expires, no matter what. I'd probably find an answer to that in a basic Hadoop security description. * DTs allow a long lived job to outlast the Kerberos ticket lifetime of the submitter; usually 48-72h. * submitting jobs with DTs limit the access of the job to those services for which you have a DT; no need to acquire Kerberos tickets for every query being run. This keeps load on kerberos down, which is good as with Active Directory that's usually shared with the rest of the organisation. Some kerberos servers treat a bulk access from a few thousand machines as a brute force attack. * Delegation tokens can also be revoked at the NN. After a process terminates, something (YARN NM?) can chat with the NN and say "no longer valid". In contrast, Kerberos TGTs stay valid until that timeout, without any revocation mechanism. I'm not sure and I never really got a good answer to that (I had the same question in the past). My best guess is to limit how long an attacker can do bad things if he gets hold of a delegation token. But IMO if an attacker gets a delegation token, that's pretty bad regardless of how long he can use it... correct: limits the damage. In contrast, if someone has your keytab, they have access until that KT expires. I have a feeling that giving the keytab to Spark bypasses the concept behind delegation tokens. As I understand, the NN basically says that "your application can access hdfs with this delegation token, but only for 7 days". I'm not sure why there's a 7 day limit either, but let's assume there's a good reason. Basically the app, at that point, needs to prove to the NN it has a valid kerberos credential. Whether that's from someone typing their password into a terminal, or code using a keytab, it doesn't really matter. If someone was worried about that user being malicious they'd disable the user's login in the KDC. This feature is needed because there are apps that need to keep running, unattended, for longer than HDFS's max lifetime setting. pretty much it. FWIW that's why turning Kerberos on midweek morning, rather than a friday evening, is wise. The 7 day timeout event will start happening during working hours. https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md