Hi Xiaohan, You're right that if a malicious user can get a fake DN to register with the NN then the low ports don't matter. However, in order for a user to get a malicious DN to register with the NN that user will need access to the DN keytab. If the user has access to that, then all is lost anyway since by logging in with that keytab the user can act as an HDFS super user.
The purpose of the requiring the low ports is to prevent the following scenario: 1) Malicious user finds a crashed DN process or somehow causes a DN process to crash. 2) Before the NN considers that DN dead (by default 10 minutes) the malicious user starts a fake DN process on the same (high) ports. 3) The NN continues to tell clients that it's OK to write to that DN that has just crashed for 10 minutes. 4) The malicious user steals all the data written to the crashed process in those 10 minutes. I hope this clears things up. -- Aaron T. Myers Software Engineer, Cloudera On Fri, Nov 16, 2012 at 6:44 AM, Xiaohan <yians.x...@huawei.com> wrote: > Hi, guys. > > Now our cluster is moving to security mode. We find many difference with > the non-security, one is the starting of datanode. And I am not sure how it > works, so I send the email here to ask. > Secure mode must use jsvc liking tools to start datanode because it allows > the datanode listening the port under 1024 woring in not-root user. > I search the reason of using the port under 1024 with google, only findng > that Cloudera's CDH doc describes the reason, which is "DataNode must be > below 1024, because this provides part of the security mechanism to make it > impossible for a user to run a map task which impersonates a DataNode." > I try to configure the datanode's http port with 2004(the suggesting value > is 1004) which is above 1024, then starting it in secure mode. It result in > a failure of starting the one as expected. But I found the failure is > because of the DataNode itself check the number and throws the exception. > Since user to run map task may impersonate the DataNode, he could also > change the code of DataNode with avoiding the check in DataNode. When user > do it, it still impersonate the DataNode with a port above 1024, which a > non-root user could use and then application in map task could use. > Then I supposed that NN should also do the check, so I deleted the check > code in DataNode, configuring the http port with 2004, then starting > DataNode in secure mode. The DataNode starting successfully and the NN > accept the DataNode. > The data is also writed to the DataNode. Everything works well as the > DataNode is a normal one. > > Is it a defect? Or I 've missed something. If either of them, please let > me know. Thank you. >