I looked into the source code of SparkHadoopMapReduceUtil.scala. I think it is
broken in the following code:
def newTaskAttemptContext(conf: Configuration, attemptId: TaskAttemptID):
TaskAttemptContext = { val klass = firstAvailableClass(
"org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl", // hadoop2,
hadoop2-yarn "org.apache.hadoop.mapreduce.TaskAttemptContext")
// hadoop1 val ctor = klass.getDeclaredConstructor(classOf[Configuration],
classOf[TaskAttemptID]) ctor.newInstance(conf,
attemptId).asInstanceOf[TaskAttemptContext] }
In other words, it is related to hadoop2, hadoop2-yarn, and hadoop1. Any
suggestion how to resolve it?
Thanks.
> From: [email protected]
> Date: Fri, 23 Jan 2015 14:01:45 +0000
> Subject: Re: spark 1.1.0 save data to hdfs failed
> To: [email protected]
> CC: [email protected]
>
> These are all definitely symptoms of mixing incompatible versions of
> libraries.
>
> I'm not suggesting you haven't excluded Spark / Hadoop, but, this is
> not the only way Hadoop deps get into your app. See my suggestion
> about investigating the dependency tree.
>
> On Fri, Jan 23, 2015 at 1:53 PM, ey-chih chow <[email protected]> wrote:
> > Thanks. But I think I already mark all the Spark and Hadoop reps as
> > provided. Why the cluster's version is not used?
> >
> > Any way, as I mentioned in the previous message, after changing the
> > hadoop-client to version 1.2.1 in my maven deps, I already pass the
> > exception and go to another one as indicated below. Any suggestion on this?
> >
> > =================================
> >
> > Exception in thread "main" java.lang.reflect.InvocationTargetException
> > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > at
> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> > at
> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> > at java.lang.reflect.Method.invoke(Method.java:606)
> > at
> > org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:40)
> > at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
> > Caused by: java.lang.IncompatibleClassChangeError: Implementing class
> > at java.lang.ClassLoader.defineClass1(Native Method)
> > at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
> > at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
> > at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
> > at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> > at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> > at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> > at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> > at java.lang.Class.forName0(Native Method)
> > at java.lang.Class.forName(Class.java:191)
> > at
> > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.firstAvailableClass(SparkHadoopMapReduceUtil.scala:73)
> > at
> > org.apache.hadoop.mapreduce.SparkHadoopMapReduceUtil$class.newTaskAttemptContext(SparkHadoopMapReduceUtil.scala:35)
> > at
> > org.apache.spark.rdd.PairRDDFunctions.newTaskAttemptContext(PairRDDFunctions.scala:53)
> > at
> > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:932)
> > at
> > org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:832)
> > at com.crowdstar.etl.ParseAndClean$.main(ParseAndClean.scala:103)
> > at com.crowdstar.etl.ParseAndClean.main(ParseAndClean.scala)
> >
> > ... 6 more
> >