Hello again,
maybe my question was misleading.
I am asking whether the intended usage is to provide the job with the
required library’s and sent those together with the job to yarn(if yes
how can this be done?), or to add the required classes to the classpath
of every node in the cluster.
What is the best practice?
Best,
Max
On 01/07/2015 06:13 PM, mw wrote:
Hello,
the first error was due to a missing property in yarn.xml. However no
i have a different problem.
i am working on a web application that should execute lda on a
external yarn cluster.
I am uploading all the relevant sequence files onto the yarn cluter.
This is how it try to remotely execute lda on the cluster.
try {
ugi.doAs(new PrivilegedExceptionAction<Void>() {
public Void run() throws Exception {
Configuration hdoopConf = new Configuration();
hdoopConf.set("fs.defaultFS",
"hdfs://xxx.xxx.xxx.xxx:9000/user/xx");
hdoopConf.set("yarn.resourcemanager.hostname",
"xxx.xxx.xxx.xxx");
hdoopConf.set("mapreduce.framework.name", "yarn");
hdoopConf.set("mapred.framework.name", "yarn");
hdoopConf.set("mapred.job.tracker",
"xxx.xxx.xxx.xxx");
hdoopConf.set("dfs.permissions.enabled", "false");
hdoopConf.set("hadoop.job.ugi", "xx");
hdoopConf.set("mapreduce.jobhistory.address","xxx.xxx.xxx.xxx:10020" );
CVB0Driver driver = new CVB0Driver();
try {
driver.run(hdoopConf,
sparseVectorIn.suffix("/matrix"),
topicsOut, k, numTerms,
doc_topic_smoothening, term_topic_smoothening,
maxIter, iteration_block_size,
convergenceDelta,
sparseVectorIn.suffix("/dictionary.file-0"),
topicsOut.suffix("/DocumentTopics/"), sparseVectorIn,
seed, testFraction, numTrainThreads,
numUpdateThreads, maxItersPerDoc,
numReduceTasks, backfillPerplexity);
} catch (ClassNotFoundException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
return null;
}
});
} catch (InterruptedException e) {
e.printStackTrace();
}
I am getting the following error message:
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Error: java.lang.ClassNotFoundException: org.apache.mahout.math.Vector
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:344)
at
org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1844)
at
org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1809)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1903)
at
org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1929)
at
org.apache.hadoop.mapred.JobConf.getMapOutputValueClass(JobConf.java:837)
at
org.apache.hadoop.mapred.MapTask$MapOutputBuffer.init(MapTask.java:983)
at
org.apache.hadoop.mapred.MapTask.createSortingCollector(MapTask.java:391)
at org.apache.hadoop.mapred.MapTask.access$100(MapTask.java:80)
at
org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:675)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:747)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1614)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
java.lang.InterruptedException: Failed to complete iteration 1 stage 1
at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.runIteration(CVB0Driver.java:502)
at
org.apache.mahout.clustering.lda.cvb.CVB0Driver.run(CVB0Driver.java:319)
...
So apparently the job misses some mahout classes. How can i provide
the required classes to yarn?
Best,
Max