if your run time gets too high, try to start with low -k (like 10 or something) and -q=0, that will significantly reduce complexity of the problem.
if this works, you need to find optimal levers that suit your hardware/input size/ runtime requirements. ( I can tell you right away that (k+p) value influences single task runtime according to power law). Like something like -k 500 will probably not yield a satisfactory time ever. The performance study in Nathan Halko's dissertation computed first 100 singlular values/vectors iirc. i.e about k=100, p=15. getting -q=1 boosts accuracy significantly, so if you can affort it at all time-wise, i'd suggest to use -q=1 instead of cranking up -p parameter off the default value. Values -q >1 are never practical. -d On Tue, Apr 28, 2015 at 10:03 AM, Mihai Dascalu <[email protected]> wrote: > I’ve created a SWING interface around the invocation, but it is not a > classpath setting as the SVD runs for more than 1h. Afterwards I have the > runtime error in the HTTPclient, which is really strange. Also I have a lot > of map operations in the console, but no reduce operations are logged. > > Thanks! > Mihai > > > On 28 Apr 2015, at 01:09, lastarsenal <[email protected]> wrote: > > > > What's your run command? I think it is because of your classpath setting. > > > > > > > > > > At 2015-04-28 15:25:01, "Mihai Dascalu" <[email protected]> wrote: > >> Hi! > >> > >> > >> I’ve been experimenting with the SSVDSolver and unfortunately, during > runtime, I encounter this error: > >> > >> 10648576 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner - > job_local1958711697_0001 > >> java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod > >> at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546) > >> Caused by: java.lang.ClassNotFoundException: > org.apache.commons.httpclient.HttpMethod > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > >> ... 1 more > >> > >> Exception in thread "Thread-13" java.lang.NoClassDefFoundError: > org/apache/commons/httpclient/HttpMethod > >> at > org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562) > >> Caused by: java.lang.ClassNotFoundException: > org.apache.commons.httpclient.HttpMethod > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:372) > >> at java.net.URLClassLoader$1.run(URLClassLoader.java:361) > >> at java.security.AccessController.doPrivileged(Native Method) > >> at java.net.URLClassLoader.findClass(URLClassLoader.java:360) > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:424) > >> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) > >> at java.lang.ClassLoader.loadClass(ClassLoader.java:357) > >> ... 1 more > >> > >> The actual invocation is: > >> > >> public static void runSSVDOnSparseVectors(String inputPath, > >> String outputPath, int rank, int oversampling, int > blocks, > >> int reduceTasks, int powerIterations, boolean > halfSigma) > >> throws IOException { > >> Configuration conf = new Configuration(); > >> SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path( > >> inputPath) }, new Path(outputPath), blocks, rank, > oversampling, > >> reduceTasks); > >> solver.setQ(powerIterations); > >> if (halfSigma) { > >> solver.setcUHalfSigma(true); > >> solver.setcVHalfSigma(true); > >> } > >> solver.run(); > >> } > >> > >> while being invoked with (input.getParent() + “/" + > TERM_DOC_MATRIX_NAME, input.getParent() + “/" + SVD_FOLDER_NAME, k, 2 * k, > Math.min(200000, (int) (3 * k * 0.01 * > Math.max(lsaTraining.getNoDocuments(),lsaTraining.getNoWords()))), 5, 2, > true); > >> > >> I’m using Mahout 0.10 with httpclient-4.4.1.jar (I tried also 4.2.5 > from the package archive) on a 48k words X 53k docs matrix. > >> > >> Any ideas? It works fine with the similar variables if I run the job in > command line. > >> > >> Also, how should I tweak the input variables? > >> > >> > >> Thanks in advance! > >> Mihai > >
