ahhh, apologies for badmouthing hadoop... So, I finally disocvered one problem that may have caused this kind of degradation of performance. After growing the data set even larger to 35gb, Hadoop crashed with disk full error. It would appear that the system will actually continue to work when the disk is almost full, but there appear to be some thing that causes it to slow down. Does hdfs juggle blocks around when there isn't enough space on a slave machine? That would explain why it was slowing down so much when the fs is almost full... Another part of this is that I've updated by expectation of the speedup, accounting for the sort that is happening, it is indeed faster.
I've upgraded to 0.18.2, and I now see the exception that is slowing down the reducer near the end of the run(pasted below),any suggestions on this one? at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 2008-12-12 12:06:28,403 WARN /: /mapOutput?job=job_200812121139_0001&map=attempt_200812121139_0001_m_000114_0&reduce=2: java.lang.IllegalStateException: Committed at org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212) at org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375) at org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2504) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at javax.servlet.http.HttpServlet.service(HttpServlet.java:802) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427) at org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567) at org.mortbay.http.HttpContext.handle(HttpContext.java:1565) at org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635) at org.mortbay.http.HttpContext.handle(HttpContext.java:1517) at org.mortbay.http.HttpServer.service(HttpServer.java:954) at org.mortbay.http.HttpConnection.service(HttpConnection.java:814) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) 2008-12-12 12:06:31,107 INFO org.apache.hadoop.mapred.TaskTracker: attempt_200812121139_0001_r_000007_0 0.29801327% reduce > copy (135 of 151 at 1.42 MB/s) > On Thu, Dec 11, 2008 at 12:13 PM, hc busy <[email protected]> wrote: > And aside from refusing to declare task complete after everything is 100%, > I also notied that the mapper seems too slow. It's taking the same amount of > time for 4 machines to read and write through the 30gb file as if I did it > with a /bin/cat on one machine. Do you guys have any suggestions with > regards to these two problems? > On Wed, Dec 10, 2008 at 4:37 PM, hc busy <[email protected]> wrote: > >> Guys, I've just configured a hadoop cluster for the first time, and I'm >> running a null map-reduction over the streaming interface. (/bin/cat for >> both map and reducer). So I noticed that the mapper and reducer complete >> 100% in the web ui within a reasonable amount of time, but the job does not >> complete. On command line it displays >> >> ...INFO streaming.StreamJob: map 100% reduce 100% >> >> In the web ui, it shows map completion graph is 100%, but does not display >> a reduce completion graph. The four machines are well equiped to handle the >> size of data (30gb). Looking at the task tracker on each of the machines, I >> noticed that it is ticking down the percents very very slowly: >> >> 2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000002_0 46.684883% Records R/W=149326846/149326834 >> > reduce >> 2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000006_0 47.566963% Records R/W=151739348/151739342 >> > reduce >> 2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000002_0 46.826576% Records R/W=149326846/149326834 >> > reduce >> 2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000006_0 47.741756% Records R/W=153377016/153376990 >> > reduce >> 2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000002_0 46.9636% Records R/W=149326846/149326834 > >> reduce >> 2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000006_0 47.94259% Records R/W=153377016/153376990 >> > reduce >> 2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker: >> task_200812101532_0001_r_000002_0 47.110992% Records R/W=150960648/150960644 >> > reduce >> >> so it would continue like this for hours and hours. What buffer am I >> setting too small, or what could possiblly make it go so slow?? I've worked >> on hadoop clusters before and it had always performed great on similar sized >> or larger data sets, so I suspect it's just a configuration some where that >> is making it do this? >> >> thanks in advance. >> >> >> >
