ahhh, apologies for badmouthing hadoop...

So, I finally disocvered one problem that may have caused this kind of
degradation of performance. After growing the data set even larger to 35gb,
Hadoop crashed with disk full error. It would appear that the system will
actually continue to work when the disk is almost full, but there appear to
be some thing that causes it to slow down. Does hdfs juggle blocks around
when there isn't enough space on a slave machine? That would explain why it
was slowing down so much when the fs is almost full... Another part of this
is that I've updated by expectation of the speedup, accounting for the sort
that is happening, it is indeed faster.

I've upgraded to 0.18.2, and I now see the exception that is slowing down
the reducer near the end of the run(pasted below),any suggestions on this
one?

  at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)

2008-12-12 12:06:28,403 WARN /:
/mapOutput?job=job_200812121139_0001&map=attempt_200812121139_0001_m_000114_0&reduce=2:
java.lang.IllegalStateException: Committed
  at
org.mortbay.jetty.servlet.ServletHttpResponse.resetBuffer(ServletHttpResponse.java:212)
  at
org.mortbay.jetty.servlet.ServletHttpResponse.sendError(ServletHttpResponse.java:375)
  at
org.apache.hadoop.mapred.TaskTracker$MapOutputServlet.doGet(TaskTracker.java:2504)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:689)
  at javax.servlet.http.HttpServlet.service(HttpServlet.java:802)
  at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:427)
  at
org.mortbay.jetty.servlet.WebApplicationHandler.dispatch(WebApplicationHandler.java:475)
  at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:567)
  at org.mortbay.http.HttpContext.handle(HttpContext.java:1565)
  at
org.mortbay.jetty.servlet.WebApplicationContext.handle(WebApplicationContext.java:635)
  at org.mortbay.http.HttpContext.handle(HttpContext.java:1517)
  at org.mortbay.http.HttpServer.service(HttpServer.java:954)
  at org.mortbay.http.HttpConnection.service(HttpConnection.java:814)
  at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:981)
  at org.mortbay.http.HttpConnection.handle(HttpConnection.java:831)
  at
org.mortbay.http.SocketListener.handleConnection(SocketListener.java:244)
  at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357)
  at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
2008-12-12 12:06:31,107 INFO org.apache.hadoop.mapred.TaskTracker:
attempt_200812121139_0001_r_000007_0 0.29801327% reduce > copy (135 of 151
at 1.42 MB/s) >


On Thu, Dec 11, 2008 at 12:13 PM, hc busy <[email protected]> wrote:

> And aside from refusing to declare task complete after everything is 100%,
> I also notied that the mapper seems too slow. It's taking the same amount of
> time for 4 machines to read and write through the 30gb file as if I did it
> with a /bin/cat on one machine. Do you guys have any suggestions with
> regards to these two problems?
> On Wed, Dec 10, 2008 at 4:37 PM, hc busy <[email protected]> wrote:
>
>> Guys, I've just configured a hadoop cluster for the first time, and I'm
>> running a null map-reduction over the streaming interface. (/bin/cat for
>> both map and reducer). So I noticed that the mapper and reducer complete
>> 100% in the web ui within a reasonable amount of time, but the job does not
>> complete. On command line it displays
>>
>> ...INFO streaming.StreamJob: map 100% reduce 100%
>>
>> In the web ui, it shows map completion graph is 100%, but does not display
>> a reduce completion graph. The four machines are well equiped to handle the
>> size of data (30gb). Looking at the task tracker on each of the machines, I
>> noticed that it is ticking down the percents very very slowly:
>>
>> 2008-12-10 16:18:55,265 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 46.684883% Records R/W=149326846/149326834
>> > reduce
>> 2008-12-10 16:18:57,055 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000006_0 47.566963% Records R/W=151739348/151739342
>> > reduce
>> 2008-12-10 16:18:58,268 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 46.826576% Records R/W=149326846/149326834
>> > reduce
>> 2008-12-10 16:19:00,058 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000006_0 47.741756% Records R/W=153377016/153376990
>> > reduce
>> 2008-12-10 16:19:01,271 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 46.9636% Records R/W=149326846/149326834 >
>> reduce
>> 2008-12-10 16:19:03,061 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000006_0 47.94259% Records R/W=153377016/153376990
>> > reduce
>> 2008-12-10 16:19:04,274 INFO org.apache.hadoop.mapred.TaskTracker:
>> task_200812101532_0001_r_000002_0 47.110992% Records R/W=150960648/150960644
>> > reduce
>>
>> so it would continue like this for hours and hours. What buffer am I
>> setting too small, or what could possiblly make it go so slow?? I've worked
>> on hadoop clusters before and it had always performed great on similar sized
>> or larger data sets, so I suspect it's just a configuration some where that
>> is making it do this?
>>
>> thanks in advance.
>>
>>
>>
>

Reply via email to