I've noticed that task tracker moves all unpacked jars into 
${hadoop.tmp.dir}/mapred/local/taskTracker.

We are using a lot of external libraries, that are deployed via "-libjars" 
option. The total number of files after unpacking is about 20 thousands.

After running a number of jobs, tasks start to be killed with timeout reason 
("Task attempt_200901281518_0011_m_000173_2 failed to report status for 601 
seconds. Killing!"). All killed tasks are in "initializing" state. I've 
watched the tasktracker logs and found such messages:


Thread 20926 (Thread-10368):
  State: BLOCKED
  Blocked count: 3611
  Waited count: 24
  Blocked on java.lang.ref.reference$l...@e48ed6
  Blocked by 20882 (Thread-10341)
  Stack:
    java.lang.StringCoding$StringEncoder.encode(StringCoding.java:232)
    java.lang.StringCoding.encode(StringCoding.java:272)
    java.lang.String.getBytes(String.java:947)
    java.io.UnixFileSystem.getBooleanAttributes0(Native Method)
    java.io.UnixFileSystem.getBooleanAttributes(UnixFileSystem.java:228)
    java.io.File.isDirectory(File.java:754)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:427)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)
    org.apache.hadoop.fs.FileUtil.getDU(FileUtil.java:433)


This is exactly as in HADOOP-4780. 
As I understand, patch brings the code, which stores map of directories along 
with their DU's, thus reducing the number of calls to DU. This must help but 
the process of deleting 20000 files taks too long. I've manually deleted 
archive after 10 jobs had run and it took over 30 minutes on XFS. Three times 
more, that default timeout for tasks!

Is there is the way to prohibit unpacking of jars? Or at least not to hold the 
archive? Or any other better way to solve this problem?

Hadoop version: 0.19.0.


-- 
Andrew Gudkov
PGP key id: CB9F07D8 (cryptonomicon.mit.edu)
Jabber: [email protected]

Reply via email to