Its more like, Spark is not able to find the hadoop jars. Try setting the HADOOP_CONF_DIR and also make sure *-site.xml are available in the CLASSPATH/SPARK_CLASSPATH.
Thanks Best Regards On Mon, Jan 26, 2015 at 7:28 PM, Staffan <staffan.arvids...@gmail.com> wrote: > I'm using Maven and Eclipse to build my project. I'm letting Maven download > all the things I need for running everything, which has worked fine up > until > now. I need to use the CDK library (https://github.com/egonw/cdk, > http://sourceforge.net/projects/cdk/) and once I add the dependencies to > my > pom.xml Spark starts to complain (this is without calling any function or > importing any new library into my code, only by introducing new > dependencies > to the pom.xml). Trying to set up a SparkContext give me the following > errors in the log: > > [main] DEBUG org.apache.spark.rdd.HadoopRDD - SplitLocationInfo and other > new Hadoop classes are unavailable. Using the older Hadoop location info > code. > java.lang.ClassNotFoundException: > org.apache.hadoop.mapred.InputSplitWithLocationInfo > at java.net.URLClassLoader$1.run(URLClassLoader.java:366) > at java.net.URLClassLoader$1.run(URLClassLoader.java:355) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:354) > at java.lang.ClassLoader.loadClass(ClassLoader.java:425) > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > at java.lang.ClassLoader.loadClass(ClassLoader.java:358) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:191) > at > > org.apache.spark.rdd.HadoopRDD$SplitInfoReflections.<init>(HadoopRDD.scala:381) > at org.apache.spark.rdd.HadoopRDD$.liftedTree1$1(HadoopRDD.scala:391) > at org.apache.spark.rdd.HadoopRDD$.<init>(HadoopRDD.scala:390) > at org.apache.spark.rdd.HadoopRDD$.<clinit>(HadoopRDD.scala) > at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:159) > at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:194) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205) > at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203) > at scala.Option.getOrElse(Option.scala:120) > at org.apache.spark.rdd.RDD.partitions(RDD.scala:203) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1328) > at org.apache.spark.rdd.RDD.foreach(RDD.scala:765) > > later in the log: > [Executor task launch worker-0] DEBUG > org.apache.spark.deploy.SparkHadoopUtil - Couldn't find method for > retrieving thread-level FileSystem input data > java.lang.NoSuchMethodException: > org.apache.hadoop.fs.FileSystem$Statistics.getThreadStatistics() > at java.lang.Class.getDeclaredMethod(Class.java:2009) > at org.apache.spark.util.Utils$.invoke(Utils.scala:1733) > at > > org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) > at > > org.apache.spark.deploy.SparkHadoopUtil$$anonfun$getFileSystemThreadStatistics$1.apply(SparkHadoopUtil.scala:178) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244) > at > > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at scala.collection.TraversableLike$class.map(TraversableLike.scala:244) > at scala.collection.AbstractTraversable.map(Traversable.scala:105) > at > > org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatistics(SparkHadoopUtil.scala:178) > at > > org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:138) > at org.apache.spark.rdd.HadoopRDD$$anon$1.<init>(HadoopRDD.scala:220) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:210) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:99) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:230) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61) > at org.apache.spark.scheduler.Task.run(Task.scala:56) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196) > at > > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) > > There has also been issues related to "HADOOP_HOME" not being set etc., but > which seems to be intermittent and only occur sometimes. > > > After testing different versions of both CDK and Spark, I've found out that > the Spark version 0.9.1 and earlier DO NOT have this problem, so there is > something in the newer versions of Spark that do not play well with > others... However, I need the functionality in the later versions of Spark > so this do not solve my problem. Anyone willing to try to reproduce the > issue can do so by adding the dependencies for CDK: > > <dependency> > <groupId>org.openscience.cdk</groupId> > <artifactId>cdk-fingerprint</artifactId> > <version>1.5.10</version> > </dependency> > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Issues-when-combining-Spark-and-a-third-party-java-library-tp21367.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >