if your run time gets too high, try to start with low -k (like 10 or
something) and -q=0, that will significantly reduce complexity of the
problem.

if this works, you need to find optimal levers that suit your
hardware/input size/ runtime requirements. ( I can tell you right away that
(k+p) value influences single task runtime according to power law). Like
something like -k 500 will probably not yield a satisfactory time ever. The
performance study in Nathan Halko's dissertation  computed first 100
singlular values/vectors iirc. i.e about k=100, p=15.

getting -q=1 boosts accuracy significantly, so if you can affort it at all
time-wise, i'd suggest to use -q=1 instead of cranking up -p parameter off
the default value. Values -q >1 are never practical.


-d



On Tue, Apr 28, 2015 at 10:03 AM, Mihai Dascalu <[email protected]>
wrote:

> I’ve created a SWING interface around the invocation, but it is not a
> classpath setting as the SVD runs for more than 1h. Afterwards I have the
> runtime error in the HTTPclient, which is really strange. Also I have a lot
> of map operations in the console, but no reduce operations are logged.
>
> Thanks!
> Mihai
>
> > On 28 Apr 2015, at 01:09, lastarsenal <[email protected]> wrote:
> >
> > What's your run command? I think it is because of your classpath setting.
> >
> >
> >
> >
> > At 2015-04-28 15:25:01, "Mihai Dascalu" <[email protected]> wrote:
> >> Hi!
> >>
> >>
> >> I’ve been experimenting with the SSVDSolver and unfortunately, during
> runtime, I encounter this error:
> >>
> >> 10648576 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner  -
> job_local1958711697_0001
> >> java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
> >>      at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
> >> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.httpclient.HttpMethod
> >>      at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> >>      at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> >>      at java.security.AccessController.doPrivileged(Native Method)
> >>      at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>      ... 1 more
> >>
> >> Exception in thread "Thread-13" java.lang.NoClassDefFoundError:
> org/apache/commons/httpclient/HttpMethod
> >>      at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
> >> Caused by: java.lang.ClassNotFoundException:
> org.apache.commons.httpclient.HttpMethod
> >>      at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
> >>      at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
> >>      at java.security.AccessController.doPrivileged(Native Method)
> >>      at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
> >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
> >>      at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> >>      at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
> >>      ... 1 more
> >>
> >> The actual invocation is:
> >>
> >> public static void runSSVDOnSparseVectors(String inputPath,
> >>                      String outputPath, int rank, int oversampling, int
> blocks,
> >>                      int reduceTasks, int powerIterations, boolean
> halfSigma)
> >>                      throws IOException {
> >>      Configuration conf = new Configuration();
> >>      SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path(
> >>                      inputPath) }, new Path(outputPath), blocks, rank,
> oversampling,
> >>                      reduceTasks);
> >>      solver.setQ(powerIterations);
> >>      if (halfSigma) {
> >>              solver.setcUHalfSigma(true);
> >>              solver.setcVHalfSigma(true);
> >>      }
> >>      solver.run();
> >> }
> >>
> >> while being invoked with (input.getParent() + “/" +
> TERM_DOC_MATRIX_NAME, input.getParent() + “/" + SVD_FOLDER_NAME, k, 2 * k,
> Math.min(200000, (int) (3 * k * 0.01 *
> Math.max(lsaTraining.getNoDocuments(),lsaTraining.getNoWords()))), 5, 2,
> true);
> >>
> >> I’m using Mahout 0.10 with httpclient-4.4.1.jar (I tried also 4.2.5
> from the package archive) on a 48k words X 53k docs matrix.
> >>
> >> Any ideas? It works fine with the similar variables if I run the job in
> command line.
> >>
> >> Also, how should I tweak the input variables?
> >>
> >>
> >> Thanks in advance!
> >> Mihai
>
>

Reply via email to