Before I create a jira issue, maybe somebody can shed some light on the idea 
behind dfs.https.enable. We've enabled SPNEGO authentication on our 
webinterfaces, and I think the HTTP connection should be encrypted - people 
with the right permissions can run administrative commands through the NameNode 
webinterface, so man-in-the-middle attacks should be prevented.

Setting dfs.https.enable to 'true' works for enabling SSL. However, in 
combination with hadoop.security.authentication=kerberos and 
dfs.https.need.client.auth=false this doesn't work on the NameNode; it *does* 
set up an SSL socket but *doesn't* use my keystore. I can find where this 
happens in the (bit messy) code - the conditional paths are wrong at one point, 
basically because it stops making a difference between dfs.https.enable and 
dfs.https.need.client.auth. This happens in NameNode, 
Krb5AndSslSocketConnector, and HttpServer. I could fix this and add SSL without 
client authentication, but that's intrusive since I'd need to change some 
method signatures (of methods DataNode and SecondaryNameNode also depend on).

But maybe I've been understanding this wrong. The bug might be that 
dfs.https.enable enables SSL even without dfs.https.need.client.auth. Is see 
that HDFS-2617 discusses taking out KSSL in favor of SPNEGO. I don't know what 
KSSL is, but I'm guessing it is client-side certificate authentication using 
Kerberos tickets. Taking that out if fine (SPNEGO works for us) but then what 
about encryption on the line? Or is it common practice to just not access the 
NN webinterface as an admin from outside of a trusted network, or just access 
it through a VPN?

Can anybody tell me what the idea of dfs.https.enable is? And how this is 
supposed to work in combination with hadoop.security.authentication and 
dfs.https.need.client.auth?

Thanks,
Evert Lammerts

Reply via email to