Re: Can't get Master Kerberos principal for use as renewer

Finamore A. Tue, 17 Jun 2014 01:22:25 -0700

Update.

I've reconfigured the environment to use Spark 1.0.0 and the example
finally worked! :)


The different for me was that Spark 1.0.0 requires only to specify the
hadoop conf dir (HADOOP_CONF_DIR=/etc/hadoop/conf/)
I guess that with 0.9 there were problems in spotting this dir...but I'm
not sure why.



On 16 June 2014 23:03, Finamore A. <[email protected]> wrote:

> Hi,
>
> I'm a new user to Spark and I'm trying to integrate it in my cluster.
> It's a small set of nodes running CDH 4.7 with kerberos.
> The other services are fine with the authentication but I've some troubles
> with spark.
>
> First, I used the parcel available in cloudera manager (SPARK
> 0.9.0-1.cdh4.6.0.p0.98)
> Since the cluster has CDH4.7 (not 4.6) I'm not sure if this can create
> problems.
> I've also tried with the new spark 1.0.0 with no luck ...
>
> I've configured the environment as reported in
>
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CM4Ent/4.8.1/Cloudera-Manager-Installation-Guide/cmig_spark_installation_standalone.html
> I'm using a standalone deployment.
>
> When launching spark-shell (for testing), everything seems fine (the
> process got registered with master)
> But when I try to execute the example reported in the installation page,
> Kerberos blocks the access to HDFS
> scala> val file = sc.textFile("hdfs://
> m1hadoop.polito.it:8020/user/finamore/data")
> 14/06/16 22:28:36 INFO storage.MemoryStore: ensureFreeSpace(135653) called
> with curMem=0, maxMem=308713881
> 14/06/16 22:28:36 INFO storage.MemoryStore: Block broadcast_0 stored as
> values to memory (estimated size 132.5 KB, free 294.3 MB)
> file: org.apache.spark.rdd.RDD[String] = MappedRDD[1] at textFile at
> <console>:12
>
> scala> val counts = file.flatMap(line => line.split(" ")).map(word =>
> (word, 1)).reduceByKey(_ + _)
> java.io.IOException: Can't get Master Kerberos principal for use as renewer
>  at
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
> at
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
>  at
> org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
> at
> org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:187)
>  at
> org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:251)
> at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:140)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>  at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>  at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
> at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>  at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
> at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
>  at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
> at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>  at scala.Option.getOrElse(Option.scala:120)
> at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
> at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:58)
>  at
> org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:354)
> at $iwC$$iwC$$iwC$$iwC.<init>(<console>:14)
>  at $iwC$$iwC$$iwC.<init>(<console>:19)
> at $iwC$$iwC.<init>(<console>:21)
> at $iwC.<init>(<console>:23)
>  at <init>(<console>:25)
> at .<init>(<console>:29)
> at .<clinit>(<console>)
>  at .<init>(<console>:7)
> at .<clinit>(<console>)
> at $print(<console>)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>  at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:616)
>  at
> org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
> at
> org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
>  at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
> at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
>  at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
> at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:788)
>  at
> org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:833)
> at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:745)
>  at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:593)
> at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:600)
>  at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:603)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:926)
>  at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
> at
> org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
>  at
> scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
> at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:876)
>  at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:968)
> at org.apache.spark.repl.Main$.main(Main.scala:31)
>  at org.apache.spark.repl.Main.main(Main.scala)
>
>
> Of course, I've execute kinit before firing the shell and the user can
> also access to hdfs from command line.
> I guess spark is not properly reading the configuration
> As written in the cloudera documentation, I've specified
> DEFAULT_HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
> ...which also has the proper definition of the kerberos principal
>
> Any idea of what I'm missing?
>
> Thanks!
>
> --
> --------------------------------------------------
> Alessandro Finamore, PhD
> Politecnico di Torino
> --
> Office:    +39 0115644127
> Mobile:   +39 3280251485
> SkypeId: alessandro.finamore
> ---------------------------------------------------
>



-- 
--------------------------------------------------
Alessandro Finamore, PhD
Politecnico di Torino
--
Office:    +39 0115644127
Mobile:   +39 3280251485
SkypeId: alessandro.finamore
---------------------------------------------------

Re: Can't get Master Kerberos principal for use as renewer

Reply via email to