But also please update to Mahout version 0.9 since you're two versions behind.
On Wed, Dec 17, 2014 at 10:55 AM, Andrew Musselman < [email protected]> wrote: > > It's worth trying to increase the heap size for child JVMs per this doc, > depending on what version you're running: > http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html > > On Tue, Dec 16, 2014 at 11:33 PM, 万代豊 <[email protected]> wrote: >> >> Hi >> After my several successful jobs experiences on other Mahout Kmeans >> calculation in the past , I'm facing a sudden heap error as below in >> Mahout >> seq2sparse process.(Mahout-0.70 on Hadoop-0.20.203 Pseudo-distributed) >> >> [hadoop@localhost TEST]$ $MAHOUT_HOME/bin/mahout seq2sparse --namedVector >> -i TEST/TEST-seqfile/ -o TEST/TEST-namedVector -ow -a >> org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5 >> -md >> 3 -x 90 -ml 50 -seq -n 2 >> Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= >> MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar >> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum >> n-gram size is: 1 >> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum >> LLR value: 50.0 >> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number >> of >> reduce tasks: 1 >> 14/12/16 22:52:57 INFO input.FileInputFormat: Total input paths to process >> : 10 >> 14/12/16 22:52:57 INFO mapred.JobClient: Running job: >> job_201412162229_0005 >> 14/12/16 22:52:58 INFO mapred.JobClient: map 0% reduce 0% >> 14/12/16 22:53:27 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0005_m_000000_0, Status : FAILED >> Error: Java heap space >> 14/12/16 22:53:29 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0005_m_000001_0, Status : FAILED >> Error: Java heap space >> 14/12/16 22:53:40 INFO mapred.JobClient: map 2% reduce 0% >> 14/12/16 22:53:42 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0005_m_000000_1, Status : FAILED >> Error: Java heap space >> attempt_201412162229_0005_m_000000_1: log4j:WARN No appenders could be >> found for logger (org.apache.hadoop.mapred.Task). >> attempt_201412162229_0005_m_000000_1: log4j:WARN Please initialize the >> log4j system properly. >> 14/12/16 22:53:43 INFO mapred.JobClient: map 0% reduce 0% >> 14/12/16 22:53:48 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0005_m_000001_1, Status : FAILED >> Error: Java heap space >> 14/12/16 22:54:00 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0005_m_000000_2, Status : FAILED >> Error: Java heap space >> 14/12/16 22:54:03 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0005_m_000001_2, Status : FAILED >> Error: Java heap space >> 14/12/16 22:54:21 INFO mapred.JobClient: Job complete: >> job_201412162229_0005 >> 14/12/16 22:54:21 INFO mapred.JobClient: Counters: 7 >> 14/12/16 22:54:21 INFO mapred.JobClient: Job Counters >> 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=52527 >> 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all >> reduces waiting after reserving slots (ms)=0 >> 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all maps >> waiting after reserving slots (ms)=0 >> 14/12/16 22:54:21 INFO mapred.JobClient: Launched map tasks=8 >> 14/12/16 22:54:21 INFO mapred.JobClient: Data-local map tasks=8 >> 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 >> 14/12/16 22:54:21 INFO mapred.JobClient: Failed map tasks=1 >> Exception in thread "main" java.lang.IllegalStateException: Job failed! >> at >> >> org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95) >> at >> >> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:253) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) >> at >> >> org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at >> >> org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) >> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) >> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) >> at >> >> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) >> at >> >> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >> at java.lang.reflect.Method.invoke(Method.java:597) >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) >> >> Through looking up into other threads, I'm giving 2048 MB to Mahout as >> below. >> [hadoop@localhost TEST]$ echo $MAHOUT_HEAPSIZE >> 2048 >> >> Not sure why, connection to Mahout job from JConsole will be >> rejected,trying to check the heap status, however Mahout will dump heap >> related information on console as below in synch with connection request >> from JConsole. >> >> >> "VM Thread" prio=10 tid=0x00007fbe1405d000 nid=0x3c01 runnable >> >> "VM Periodic Task Thread" prio=10 tid=0x00007fbe14094000 nid=0x3c08 >> waiting >> on condition >> >> JNI global references: 1621 >> >> Heap >> def new generation total 18432K, used 16880K [0x00000000bc600000, >> 0x00000000bd9f0000, 0x00000000d1350000) >> eden space 16448K, 98% used [0x00000000bc600000, 0x00000000bd5d7b28, >> 0x00000000bd610000) >> from space 1984K, 33% used [0x00000000bd800000, 0x00000000bd8a47f8, >> 0x00000000bd9f0000) >> to space 1984K, 0% used [0x00000000bd610000, 0x00000000bd610000, >> 0x00000000bd800000) >> tenured generation total 40832K, used 464K [0x00000000d1350000, >> 0x00000000d3b30000, 0x00000000fae00000) >> the space 40832K, 1% used [0x00000000d1350000, 0x00000000d13c40e8, >> 0x00000000d13c4200, 0x00000000d3b30000) >> compacting perm gen total 21248K, used 15218K [0x00000000fae00000, >> 0x00000000fc2c0000, 0x0000000100000000) >> the space 21248K, 71% used [0x00000000fae00000, 0x00000000fbcdca18, >> 0x00000000fbcdcc00, 0x00000000fc2c0000) >> No shared spaces configured. >> >> 14/12/16 23:18:40 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0007_m_000000_1, Status : FAILED >> Error: Java heap space >> 14/12/16 23:18:41 INFO mapred.JobClient: Task Id : >> attempt_201412162229_0007_m_000001_1, Status : FAILED >> Error: Java heap space >> 2014-12-16 23:18:41 >> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode): >> >> In above, although it is only for a short period of time 98% of Eden Space >> on heap seems to be consumed.(Guess that heap is really running out??) >> >> Please advise me in terms of what variable names in whatever in >> Hadoop/Mahout should be increased (and may be how much.) >> >> Regards,,, >> Y.Mandai >> >
