[ https://issues.apache.org/jira/browse/HADOOP-3120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HADOOP-3120. -------------------------------------- Resolution: Incomplete I'm going to close this as stale. > Large #of tasks failing at one time can effectively hang the jobtracker > ------------------------------------------------------------------------ > > Key: HADOOP-3120 > URL: https://issues.apache.org/jira/browse/HADOOP-3120 > Project: Hadoop Common > Issue Type: Bug > Environment: Linux/Hadoop-15.3 > Reporter: Pete Wyckoff > Priority: Minor > > We think that JobTracker.removeMarkedTaks does so much logging when this > happens (ie logging thousands of failed taks per cycle) that nothing else can > go on (since it's called from a synchronized method) and thus by the next > cycle, the next waves of jobs have failed and we again have 10s of thousands > of failures to log and on and on. > At least, the above is what we observed - just a continual printing of those > failures and nothing else happening on and on. Of course the original jobs > may have ultimately failed but new jobs come in to perpetuate the problem. > This has happened to us a number of times and since we commented out the > log.info in that method we haven't had any problems. Although thousands and > thousands of task failures are hopefully not that common. -- This message was sent by Atlassian JIRA (v6.2#6252)