On 4 Nov 2016, at 01:37, Marcelo Vanzin 
<van...@cloudera.com<mailto:van...@cloudera.com>> wrote:

On Thu, Nov 3, 2016 at 3:47 PM, Zsolt Tóth 
<toth.zsolt....@gmail.com<mailto:toth.zsolt....@gmail.com>> wrote:
What is the purpose of the delegation token renewal (the one that is done
automatically by Hadoop libraries, after 1 day by default)? It seems that it
always happens (every day) until the token expires, no matter what. I'd
probably find an answer to that in a basic Hadoop security description.



* DTs allow a long lived job to outlast the Kerberos ticket lifetime of the 
submitter; usually 48-72h.
* submitting jobs with DTs limit the access of the job to those services for 
which you have a DT; no need to acquire Kerberos tickets for every query being 
run. This keeps load on kerberos down, which is good as with Active Directory 
that's usually shared with the rest of the organisation. Some kerberos servers 
treat a bulk access from a few thousand machines as a brute force attack.
* Delegation tokens can also be revoked at the NN. After a process terminates, 
something (YARN NM?) can chat with the NN and say "no longer valid". In 
contrast, Kerberos TGTs stay valid until that timeout, without any revocation 
mechanism.

I'm not sure and I never really got a good answer to that (I had the
same question in the past). My best guess is to limit how long an
attacker can do bad things if he gets hold of a delegation token. But
IMO if an attacker gets a delegation token, that's pretty bad
regardless of how long he can use it...


correct: limits the damage. In contrast, if someone has your keytab, they have 
access until that KT expires.




I have a feeling that giving the keytab to Spark bypasses the concept behind
delegation tokens. As I understand, the NN basically says that "your
application can access hdfs with this delegation token, but only for 7
days".

I'm not sure why there's a 7 day limit either, but let's assume
there's a good reason. Basically the app, at that point, needs to
prove to the NN it has a valid kerberos credential. Whether that's
from someone typing their password into a terminal, or code using a
keytab, it doesn't really matter. If someone was worried about that
user being malicious they'd disable the user's login in the KDC.

This feature is needed because there are apps that need to keep
running, unattended, for longer than HDFS's max lifetime setting.


pretty much it. FWIW that's why turning Kerberos on midweek morning, rather 
than a friday evening, is wise. The 7 day timeout event will start happening 
during working hours.

https://github.com/apache/hadoop/blob/trunk/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/markdown/YarnApplicationSecurity.md

Reply via email to