It's worth trying to increase the heap size for child JVMs per this doc, depending on what version you're running: http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html
On Tue, Dec 16, 2014 at 11:33 PM, 万代豊 <[email protected]> wrote: > > Hi > After my several successful jobs experiences on other Mahout Kmeans > calculation in the past , I'm facing a sudden heap error as below in Mahout > seq2sparse process.(Mahout-0.70 on Hadoop-0.20.203 Pseudo-distributed) > > [hadoop@localhost TEST]$ $MAHOUT_HOME/bin/mahout seq2sparse --namedVector > -i TEST/TEST-seqfile/ -o TEST/TEST-namedVector -ow -a > org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5 -md > 3 -x 90 -ml 50 -seq -n 2 > Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= > MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar > 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum > n-gram size is: 1 > 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum > LLR value: 50.0 > 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of > reduce tasks: 1 > 14/12/16 22:52:57 INFO input.FileInputFormat: Total input paths to process > : 10 > 14/12/16 22:52:57 INFO mapred.JobClient: Running job: job_201412162229_0005 > 14/12/16 22:52:58 INFO mapred.JobClient: map 0% reduce 0% > 14/12/16 22:53:27 INFO mapred.JobClient: Task Id : > attempt_201412162229_0005_m_000000_0, Status : FAILED > Error: Java heap space > 14/12/16 22:53:29 INFO mapred.JobClient: Task Id : > attempt_201412162229_0005_m_000001_0, Status : FAILED > Error: Java heap space > 14/12/16 22:53:40 INFO mapred.JobClient: map 2% reduce 0% > 14/12/16 22:53:42 INFO mapred.JobClient: Task Id : > attempt_201412162229_0005_m_000000_1, Status : FAILED > Error: Java heap space > attempt_201412162229_0005_m_000000_1: log4j:WARN No appenders could be > found for logger (org.apache.hadoop.mapred.Task). > attempt_201412162229_0005_m_000000_1: log4j:WARN Please initialize the > log4j system properly. > 14/12/16 22:53:43 INFO mapred.JobClient: map 0% reduce 0% > 14/12/16 22:53:48 INFO mapred.JobClient: Task Id : > attempt_201412162229_0005_m_000001_1, Status : FAILED > Error: Java heap space > 14/12/16 22:54:00 INFO mapred.JobClient: Task Id : > attempt_201412162229_0005_m_000000_2, Status : FAILED > Error: Java heap space > 14/12/16 22:54:03 INFO mapred.JobClient: Task Id : > attempt_201412162229_0005_m_000001_2, Status : FAILED > Error: Java heap space > 14/12/16 22:54:21 INFO mapred.JobClient: Job complete: > job_201412162229_0005 > 14/12/16 22:54:21 INFO mapred.JobClient: Counters: 7 > 14/12/16 22:54:21 INFO mapred.JobClient: Job Counters > 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=52527 > 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all > reduces waiting after reserving slots (ms)=0 > 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all maps > waiting after reserving slots (ms)=0 > 14/12/16 22:54:21 INFO mapred.JobClient: Launched map tasks=8 > 14/12/16 22:54:21 INFO mapred.JobClient: Data-local map tasks=8 > 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > 14/12/16 22:54:21 INFO mapred.JobClient: Failed map tasks=1 > Exception in thread "main" java.lang.IllegalStateException: Job failed! > at > > org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95) > at > > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:253) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > at > > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > Through looking up into other threads, I'm giving 2048 MB to Mahout as > below. > [hadoop@localhost TEST]$ echo $MAHOUT_HEAPSIZE > 2048 > > Not sure why, connection to Mahout job from JConsole will be > rejected,trying to check the heap status, however Mahout will dump heap > related information on console as below in synch with connection request > from JConsole. > > > "VM Thread" prio=10 tid=0x00007fbe1405d000 nid=0x3c01 runnable > > "VM Periodic Task Thread" prio=10 tid=0x00007fbe14094000 nid=0x3c08 waiting > on condition > > JNI global references: 1621 > > Heap > def new generation total 18432K, used 16880K [0x00000000bc600000, > 0x00000000bd9f0000, 0x00000000d1350000) > eden space 16448K, 98% used [0x00000000bc600000, 0x00000000bd5d7b28, > 0x00000000bd610000) > from space 1984K, 33% used [0x00000000bd800000, 0x00000000bd8a47f8, > 0x00000000bd9f0000) > to space 1984K, 0% used [0x00000000bd610000, 0x00000000bd610000, > 0x00000000bd800000) > tenured generation total 40832K, used 464K [0x00000000d1350000, > 0x00000000d3b30000, 0x00000000fae00000) > the space 40832K, 1% used [0x00000000d1350000, 0x00000000d13c40e8, > 0x00000000d13c4200, 0x00000000d3b30000) > compacting perm gen total 21248K, used 15218K [0x00000000fae00000, > 0x00000000fc2c0000, 0x0000000100000000) > the space 21248K, 71% used [0x00000000fae00000, 0x00000000fbcdca18, > 0x00000000fbcdcc00, 0x00000000fc2c0000) > No shared spaces configured. > > 14/12/16 23:18:40 INFO mapred.JobClient: Task Id : > attempt_201412162229_0007_m_000000_1, Status : FAILED > Error: Java heap space > 14/12/16 23:18:41 INFO mapred.JobClient: Task Id : > attempt_201412162229_0007_m_000001_1, Status : FAILED > Error: Java heap space > 2014-12-16 23:18:41 > Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode): > > In above, although it is only for a short period of time 98% of Eden Space > on heap seems to be consumed.(Guess that heap is really running out??) > > Please advise me in terms of what variable names in whatever in > Hadoop/Mahout should be increased (and may be how much.) > > Regards,,, > Y.Mandai >
