Thanks Andrew I believed to have specified something long before but there was no entry regarding heap specification in my mapred-site.xml. I have added mapred.child.java.opts property and have given -Xmx2024m. It started running well and have completed the job successfully.
I guess I should also do upgrade as well. Thanks. Y.Mandai 2014-12-18 3:57 GMT+09:00 Andrew Musselman <[email protected]>: > > But also please update to Mahout version 0.9 since you're two versions > behind. > > On Wed, Dec 17, 2014 at 10:55 AM, Andrew Musselman < > [email protected]> wrote: > > > > It's worth trying to increase the heap size for child JVMs per this doc, > > depending on what version you're running: > > > http://hadoop.apache.org/docs/r2.5.1/hadoop-project-dist/hadoop-common/ClusterSetup.html > > > > On Tue, Dec 16, 2014 at 11:33 PM, 万代豊 <[email protected]> wrote: > >> > >> Hi > >> After my several successful jobs experiences on other Mahout Kmeans > >> calculation in the past , I'm facing a sudden heap error as below in > >> Mahout > >> seq2sparse process.(Mahout-0.70 on Hadoop-0.20.203 Pseudo-distributed) > >> > >> [hadoop@localhost TEST]$ $MAHOUT_HOME/bin/mahout seq2sparse > --namedVector > >> -i TEST/TEST-seqfile/ -o TEST/TEST-namedVector -ow -a > >> org.apache.lucene.analysis.WhitespaceAnalyzer -chunk 200 -wt tfidf -s 5 > >> -md > >> 3 -x 90 -ml 50 -seq -n 2 > >> Running on hadoop, using /usr/local/hadoop/bin/hadoop and > HADOOP_CONF_DIR= > >> MAHOUT-JOB: > /usr/local/mahout-distribution-0.7/mahout-examples-0.7-job.jar > >> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: > Maximum > >> n-gram size is: 1 > >> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: > Minimum > >> LLR value: 50.0 > >> 14/12/16 22:52:55 INFO vectorizer.SparseVectorsFromSequenceFiles: Number > >> of > >> reduce tasks: 1 > >> 14/12/16 22:52:57 INFO input.FileInputFormat: Total input paths to > process > >> : 10 > >> 14/12/16 22:52:57 INFO mapred.JobClient: Running job: > >> job_201412162229_0005 > >> 14/12/16 22:52:58 INFO mapred.JobClient: map 0% reduce 0% > >> 14/12/16 22:53:27 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0005_m_000000_0, Status : FAILED > >> Error: Java heap space > >> 14/12/16 22:53:29 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0005_m_000001_0, Status : FAILED > >> Error: Java heap space > >> 14/12/16 22:53:40 INFO mapred.JobClient: map 2% reduce 0% > >> 14/12/16 22:53:42 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0005_m_000000_1, Status : FAILED > >> Error: Java heap space > >> attempt_201412162229_0005_m_000000_1: log4j:WARN No appenders could be > >> found for logger (org.apache.hadoop.mapred.Task). > >> attempt_201412162229_0005_m_000000_1: log4j:WARN Please initialize the > >> log4j system properly. > >> 14/12/16 22:53:43 INFO mapred.JobClient: map 0% reduce 0% > >> 14/12/16 22:53:48 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0005_m_000001_1, Status : FAILED > >> Error: Java heap space > >> 14/12/16 22:54:00 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0005_m_000000_2, Status : FAILED > >> Error: Java heap space > >> 14/12/16 22:54:03 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0005_m_000001_2, Status : FAILED > >> Error: Java heap space > >> 14/12/16 22:54:21 INFO mapred.JobClient: Job complete: > >> job_201412162229_0005 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Counters: 7 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Job Counters > >> 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=52527 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all > >> reduces waiting after reserving slots (ms)=0 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Total time spent by all > maps > >> waiting after reserving slots (ms)=0 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Launched map tasks=8 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Data-local map tasks=8 > >> 14/12/16 22:54:21 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0 > >> 14/12/16 22:54:21 INFO mapred.JobClient: Failed map tasks=1 > >> Exception in thread "main" java.lang.IllegalStateException: Job failed! > >> at > >> > >> > org.apache.mahout.vectorizer.DocumentProcessor.tokenizeDocuments(DocumentProcessor.java:95) > >> at > >> > >> > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.run(SparseVectorsFromSequenceFiles.java:253) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79) > >> at > >> > >> > org.apache.mahout.vectorizer.SparseVectorsFromSequenceFiles.main(SparseVectorsFromSequenceFiles.java:55) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> at > >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> at java.lang.reflect.Method.invoke(Method.java:597) > >> at > >> > >> > org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68) > >> at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139) > >> at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:195) > >> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > >> at > >> > >> > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > >> at > >> > >> > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > >> at java.lang.reflect.Method.invoke(Method.java:597) > >> at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > >> > >> Through looking up into other threads, I'm giving 2048 MB to Mahout as > >> below. > >> [hadoop@localhost TEST]$ echo $MAHOUT_HEAPSIZE > >> 2048 > >> > >> Not sure why, connection to Mahout job from JConsole will be > >> rejected,trying to check the heap status, however Mahout will dump heap > >> related information on console as below in synch with connection request > >> from JConsole. > >> > >> > >> "VM Thread" prio=10 tid=0x00007fbe1405d000 nid=0x3c01 runnable > >> > >> "VM Periodic Task Thread" prio=10 tid=0x00007fbe14094000 nid=0x3c08 > >> waiting > >> on condition > >> > >> JNI global references: 1621 > >> > >> Heap > >> def new generation total 18432K, used 16880K [0x00000000bc600000, > >> 0x00000000bd9f0000, 0x00000000d1350000) > >> eden space 16448K, 98% used [0x00000000bc600000, 0x00000000bd5d7b28, > >> 0x00000000bd610000) > >> from space 1984K, 33% used [0x00000000bd800000, 0x00000000bd8a47f8, > >> 0x00000000bd9f0000) > >> to space 1984K, 0% used [0x00000000bd610000, 0x00000000bd610000, > >> 0x00000000bd800000) > >> tenured generation total 40832K, used 464K [0x00000000d1350000, > >> 0x00000000d3b30000, 0x00000000fae00000) > >> the space 40832K, 1% used [0x00000000d1350000, 0x00000000d13c40e8, > >> 0x00000000d13c4200, 0x00000000d3b30000) > >> compacting perm gen total 21248K, used 15218K [0x00000000fae00000, > >> 0x00000000fc2c0000, 0x0000000100000000) > >> the space 21248K, 71% used [0x00000000fae00000, 0x00000000fbcdca18, > >> 0x00000000fbcdcc00, 0x00000000fc2c0000) > >> No shared spaces configured. > >> > >> 14/12/16 23:18:40 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0007_m_000000_1, Status : FAILED > >> Error: Java heap space > >> 14/12/16 23:18:41 INFO mapred.JobClient: Task Id : > >> attempt_201412162229_0007_m_000001_1, Status : FAILED > >> Error: Java heap space > >> 2014-12-16 23:18:41 > >> Full thread dump Java HotSpot(TM) 64-Bit Server VM (20.10-b01 mixed > mode): > >> > >> In above, although it is only for a short period of time 98% of Eden > Space > >> on heap seems to be consumed.(Guess that heap is really running out??) > >> > >> Please advise me in terms of what variable names in whatever in > >> Hadoop/Mahout should be increased (and may be how much.) > >> > >> Regards,,, > >> Y.Mandai > >> > > >
