Hi After my several successful jobs experiences on other Mahout Kmeans calculation in the past , I'm facing a sudden heap error as below in Mahout seq2sparse process.(Mahout-0.70 on Hadoop-0.20.203 Pseudo-distributed)
[hadoop@localhost TEST]$ $MAHOUT_HOME/bin/mahout seq2sparse --namedVector -i TEST/TEST-seqfile/ -o TEST/TEST-namedVector -ow -a org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5 -md 3 -x 90 -ml 50 -seq -n 2 Running on hadoop, using /usr/local/hadoop/bin/hadoop and HADOOP_CONF_DIR= MAHOUT-JOB: /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Maximum n-gram size is: 1 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Minimum LLR value: 50.0 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number of reduce tasks: 1 14/12/16 22:52:57 INFO input.FileInputFormat: Total input paths to process : 10 14/12/16 22:52:57 INFO mapred.JobClient: Running job: job_201412162229_0005 14/12/16 22:52:58 INFO mapred.JobClient: map 0% reduce 0% 14/12/16 22:53:27 INFO mapred.JobClient: Task Id : attempt_201412162229_0005_m_000000_0, Status : FAILED Error: Java heap space 14/12/16 22:53:29 INFO mapred.JobClient: Task Id : attempt_201412162229_0005_m_000001_0, Status : FAILED Error: Java heap space 14/12/16 22:53:40 INFO mapred.JobClient: map 2% reduce 0% 14/12/16 22:53:42 INFO mapred.JobClient: Task Id : attempt_201412162229_0005_m_000000_1, Status : FAILED Error: Java heap space attempt_201412162229_0005_m_000000_1: log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapred.Task). attempt_201412162229_0005_m_000000_1: log4j:WARN Please initialize the log4j system properly. 14/12/16 22:53:43 INFO mapred.JobClient: map 0% reduce 0% 14/12/16 22:53:48 INFO mapred.JobClient: Task Id : attempt_201412162229_0005_m_000001_1, Status : FAILED Error: Java heap space 14/12/16 22:54:00 INFO mapred.JobClient: Task Id : attempt_201412162229_0005_m_000000_2, Status : FAILED Error: Java heap space 14/12/16 22:54:03 INFO mapred.JobClient: Task Id : attempt_201412162229_0005_m_000001_2, Status : FAILED Error: Java heap space 14/12/16 22:54:21 INFO mapred.JobClient: Job complete: job_201412162229_0005 14/12/16 22:54:21 INFO mapred.JobClient: Counters: 7 14/12/16 22:54:21 INFO mapred.JobClient: Job Counters 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=52527 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0 14/12/16 22:54:21 INFO mapred.JobClient: Launched map tasks=8 14/12/16 22:54:21 INFO mapred.JobClient: Data-local map tasks=8 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 14/12/16 22:54:21 INFO mapred.JobClient: Failed map tasks=1 Exception in thread "main" java.lang.IllegalStateException: Job failed! at org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95) at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:253) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) at org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Through looking up into other threads, I'm giving 2048 MB to Mahout as below. [hadoop@localhost TEST]$ echo $MAHOUT_HEAPSIZE 2048 Not sure why, connection to Mahout job from JConsole will be rejected,trying to check the heap status, however Mahout will dump heap related information on console as below in synch with connection request from JConsole. "VM Thread" prio=10 tid=0x00007fbe1405d000 nid=0x3c01 runnable "VM Periodic Task Thread" prio=10 tid=0x00007fbe14094000 nid=0x3c08 waiting on condition JNI global references: 1621 Heap def new generation total 18432K, used 16880K [0x00000000bc600000, 0x00000000bd9f0000, 0x00000000d1350000) eden space 16448K, 98% used [0x00000000bc600000, 0x00000000bd5d7b28, 0x00000000bd610000) from space 1984K, 33% used [0x00000000bd800000, 0x00000000bd8a47f8, 0x00000000bd9f0000) to space 1984K, 0% used [0x00000000bd610000, 0x00000000bd610000, 0x00000000bd800000) tenured generation total 40832K, used 464K [0x00000000d1350000, 0x00000000d3b30000, 0x00000000fae00000) the space 40832K, 1% used [0x00000000d1350000, 0x00000000d13c40e8, 0x00000000d13c4200, 0x00000000d3b30000) compacting perm gen total 21248K, used 15218K [0x00000000fae00000, 0x00000000fc2c0000, 0x0000000100000000) the space 21248K, 71% used [0x00000000fae00000, 0x00000000fbcdca18, 0x00000000fbcdcc00, 0x00000000fc2c0000) No shared spaces configured. 14/12/16 23:18:40 INFO mapred.JobClient: Task Id : attempt_201412162229_0007_m_000000_1, Status : FAILED Error: Java heap space 14/12/16 23:18:41 INFO mapred.JobClient: Task Id : attempt_201412162229_0007_m_000001_1, Status : FAILED Error: Java heap space 2014-12-16 23:18:41 Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed mode): In above, although it is only for a short period of time 98% of Eden Space on heap seems to be consumed.(Guess that heap is really running out??) Please advise me in terms of what variable names in whatever in Hadoop/Mahout should be increased (and may be how much.) Regards,,, Y.Mandai
