Reading from Kerberos Secured HDFS in Spark?

Matt Cheah Tue, 02 Dec 2014 16:11:12 -0800

Hi everyone,

I¹ve been trying to set up Spark so that it can read data from HDFS, when
the HDFS cluster is integrated with Kerberos authentication.


I¹ve been using the Spark shell to attempt to read from HDFS, in local mode.
I¹ve set all of the appropriate properties in core-site.xml and
hdfs-site.xml as I can access and write data using the Hadoop command line
utilities. I¹ve also set HADOOP_CONF_DIR to point to the directory where
core-site.xml and hdfs-site.xml live.

I used UserGroupInformation.setConfiguration(conf) and
UserGroupInformation.loginUserFromKeytab() to set up the token, and then
SparkContext.newAPIHadoopFile( conf) (instead of SparkContext.textFile()
which I would think not pass the appropriate configurations with the
Kerberos credentials). When I do that, I get the stack trace (sorry about
the color):

java.io.IOException: Can't get Master Kerberos principal for use as renewer

        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInte
rnal(TokenCache.java:116)

        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInte
rnal(TokenCache.java:100)

        at 
org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(Tok
enCache.java:80)

        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFo
rmat.java:242)

        at 
org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFor
mat.java:385)

        at 
org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:94)

        at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:204)

        at 
org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:202)


I was wondering if anyone has had any experience setting up Spark to read
from Kerberized HDFS. What configurations needed to be set in spark-env.sh?
What am I missing?

Also, will I have an issue if I try to access HDFS in distributed mode,
using a standalone setup?

Thanks,

-Matt Cheah

smime.p7s
Description: S/MIME cryptographic signature

Reading from Kerberos Secured HDFS in Spark?

Reply via email to