I'm trying to get my feet wet with Spark. I've done some simple stuff in the
shell in standalone mode, and now I'm trying to connect to HDFS resources, but
I'm running into a problem.
I synced to git's master branch (c399baa - "SPARK-1456 Remove view bounds on
Ordered in favor of a context bound on Ordering. (3 days ago) <Michael
Armbrust>" and built like so:
SPARK_HADOOP_VERSION=2.2.0 SPARK_YARN=true sbt/sbt assembly
This created various jars in various places, including these (I think):
./examples/target/scala-2.10/spark-examples-assembly-1.0.0-SNAPSHOT.jar
./tools/target/scala-2.10/spark-tools-assembly-1.0.0-SNAPSHOT.jar
./assembly/target/scala-2.10/spark-assembly-1.0.0-SNAPSHOT-hadoop2.2.0.jar
In `conf/spark-env.sh`, I added this (actually before I did the assembly):
export HADOOP_CONF_DIR=/etc/hadoop/conf
Now I fire up the shell (bin/spark-shell) and try to grab data from HFDS, and
get the following exception:
scala> var hdf = sc.hadoopFile("hdfs:///user/kwilliams/dat/part-m-00000")
hdf: org.apache.spark.rdd.RDD[(Nothing, Nothing)] = HadoopRDD[0] at hadoopFile
at <console>:12
scala> hdf.count()
java.lang.RuntimeException: java.lang.InstantiationException
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:131)
at org.apache.spark.rdd.HadoopRDD.getInputFormat(HadoopRDD.scala:155)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:168)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:209)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:207)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1064)
at org.apache.spark.rdd.RDD.count(RDD.scala:806)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:15)
at $iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC.<init>(<console>:22)
at $iwC.<init>(<console>:24)
at <init>(<console>:26)
at .<init>(<console>:30)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at
org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:777)
at
org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1045)
at
org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:614)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:645)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:609)
at
org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:796)
at
org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:841)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:753)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:601)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:608)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:611)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:936)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at
org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:884)
at
scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:884)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:982)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.lang.InstantiationException
at
sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:48)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:129)
... 41 more
Is this recognizable to anyone as a build problem, or a config problem, or
anything? Failing that, any way to get more information about where in the
process it's failing?
Thanks.
--
Ken Williams, Senior Research Scientist
WindLogics
http://windlogics.com
________________________________
CONFIDENTIALITY NOTICE: This e-mail message is for the sole use of the intended
recipient(s) and may contain confidential and privileged information. Any
unauthorized review, use, disclosure or distribution of any kind is strictly
prohibited. If you are not the intended recipient, please contact the sender
via reply e-mail and destroy all copies of the original message. Thank you.