I got the same problem with k=100 & p=15, aBlockRows=200000 faster now (around
20minutes)
I just realized that it’s at a final step in the processing (I’ve attached the
end part of the log)
Any suggestions? In my Eclipse project I have imported:
httpclient-4.2.5.jar
mahout-hdfs-0.10.0.jar
mahout-integration-0.10.0.jar
mahout-math-0.10.0.jar
mahout-mr-0.10.0-job.jar
mahout-mr-0.10.0.jar
The strange part is that it works ok if I run it directly in the terminal.
Thanks!
Mihai
1415149 [Thread-13] INFO org.apache.hadoop.mapred.LocalJobRunner - map task
executor complete.
1415151 [Thread-13] DEBUG
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Merging data from
DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000;
isDirectory=true; modification_time=1430245778000; access_time=0; owner=;
group=; permission=rwxrwxrwx; isSymlink=false} to
file:/Users/mihaidascalu/Dropbox
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job
1415151 [Thread-13] DEBUG
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter - Merging data from
DeprecatedRawLocalFileStatus{path=file:/Users/mihaidascalu/Dropbox
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/_temporary/0/task_local1889167692_0001_m_000000/part-m-00000.deflate;
isDirectory=false; length=8; replication=1; blocksize=33554432;
modification_time=1430247134000; access_time=0; owner=; group=;
permission=rw-rw-rw-; isSymlink=false} to file:/Users/mihaidascalu/Dropbox
(Personal)/Workspace/Eclipse/ReaderBenchDev/config/LSA/tasa_lak_pos_en/svd_out/Q-job/part-m-00000.deflate
1415157 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner -
job_local1889167692_0001
java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.httpclient.HttpMethod
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
1415159 [Thread-13] DEBUG org.apache.hadoop.security.UserGroupInformation -
PrivilegedAction as:mihaidascalu (auth:SIMPLE)
from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:330)
Exception in thread "Thread-13" java.lang.NoClassDefFoundError:
org/apache/commons/httpclient/HttpMethod
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
Caused by: java.lang.ClassNotFoundException:
org.apache.commons.httpclient.HttpMethod
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
1415709 [SwingWorker-pool-1-thread-1] DEBUG
org.apache.hadoop.security.UserGroupInformation - PrivilegedAction
as:mihaidascalu (auth:SIMPLE)
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
1415709 [SwingWorker-pool-1-thread-1] DEBUG
org.apache.hadoop.security.UserGroupInformation - PrivilegedAction
as:mihaidascalu (auth:SIMPLE)
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
1415709 [SwingWorker-pool-1-thread-1] DEBUG
org.apache.hadoop.security.UserGroupInformation - PrivilegedAction
as:mihaidascalu (auth:SIMPLE)
from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:311)
1415709 [SwingWorker-pool-1-thread-1] ERROR
view.widgets.semanticModels.SemanticModelsTraining - Error procesing
config/LDA directory: Q job unsuccessful.
> On 28 Apr 2015, at 10:32, Dmitriy Lyubimov <[email protected]
> <mailto:[email protected]>> wrote:
>
> if your run time gets too high, try to start with low -k (like 10 or
> something) and -q=0, that will significantly reduce complexity of the
> problem.
>
> if this works, you need to find optimal levers that suit your
> hardware/input size/ runtime requirements. ( I can tell you right away that
> (k+p) value influences single task runtime according to power law). Like
> something like -k 500 will probably not yield a satisfactory time ever. The
> performance study in Nathan Halko's dissertation computed first 100
> singlular values/vectors iirc. i.e about k=100, p=15.
>
> getting -q=1 boosts accuracy significantly, so if you can affort it at all
> time-wise, i'd suggest to use -q=1 instead of cranking up -p parameter off
> the default value. Values -q >1 are never practical.
>
>
> -d
>
>
>
> On Tue, Apr 28, 2015 at 10:03 AM, Mihai Dascalu <[email protected]
> <mailto:[email protected]>>
> wrote:
>
>> I’ve created a SWING interface around the invocation, but it is not a
>> classpath setting as the SVD runs for more than 1h. Afterwards I have the
>> runtime error in the HTTPclient, which is really strange. Also I have a lot
>> of map operations in the console, but no reduce operations are logged.
>>
>> Thanks!
>> Mihai
>>
>>> On 28 Apr 2015, at 01:09, lastarsenal <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>
>>> What's your run command? I think it is because of your classpath setting.
>>>
>>>
>>>
>>>
>>> At 2015-04-28 15:25:01, "Mihai Dascalu" <[email protected]
>>> <mailto:[email protected]>> wrote:
>>>> Hi!
>>>>
>>>>
>>>> I’ve been experimenting with the SSVDSolver and unfortunately, during
>> runtime, I encounter this error:
>>>>
>>>> 10648576 [Thread-13] WARN org.apache.hadoop.mapred.LocalJobRunner -
>> job_local1958711697_0001
>>>> java.lang.NoClassDefFoundError: org/apache/commons/httpclient/HttpMethod
>>>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:546)
>>>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.httpclient.HttpMethod
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 1 more
>>>>
>>>> Exception in thread "Thread-13" java.lang.NoClassDefFoundError:
>> org/apache/commons/httpclient/HttpMethod
>>>> at
>> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:562)
>>>> Caused by: java.lang.ClassNotFoundException:
>> org.apache.commons.httpclient.HttpMethod
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
>>>> at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>>>> at java.security.AccessController.doPrivileged(Native Method)
>>>> at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
>>>> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>>>> at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
>>>> ... 1 more
>>>>
>>>> The actual invocation is:
>>>>
>>>> public static void runSSVDOnSparseVectors(String inputPath,
>>>> String outputPath, int rank, int oversampling, int
>> blocks,
>>>> int reduceTasks, int powerIterations, boolean
>> halfSigma)
>>>> throws IOException {
>>>> Configuration conf = new Configuration();
>>>> SSVDSolver solver = new SSVDSolver(conf, new Path[] { new Path(
>>>> inputPath) }, new Path(outputPath), blocks, rank,
>> oversampling,
>>>> reduceTasks);
>>>> solver.setQ(powerIterations);
>>>> if (halfSigma) {
>>>> solver.setcUHalfSigma(true);
>>>> solver.setcVHalfSigma(true);
>>>> }
>>>> solver.run();
>>>> }
>>>>
>>>> while being invoked with (input.getParent() + “/" +
>> TERM_DOC_MATRIX_NAME, input.getParent() + “/" + SVD_FOLDER_NAME, k, 2 * k,
>> Math.min(200000, (int) (3 * k * 0.01 *
>> Math.max(lsaTraining.getNoDocuments(),lsaTraining.getNoWords()))), 5, 2,
>> true);
>>>>
>>>> I’m using Mahout 0.10 with httpclient-4.4.1.jar (I tried also 4.2.5
>> from the package archive) on a 48k words X 53k docs matrix.
>>>>
>>>> Any ideas? It works fine with the similar variables if I run the job in
>> command line.
>>>>
>>>> Also, how should I tweak the input variables?
>>>>
>>>>
>>>> Thanks in advance!
>>>> Mihai
>>
>>