subject:"Re\: Handling fatal errors of executors and decommission datanodes"

Re: Handling fatal errors of executors and decommission datanodes

2015-03-16 Thread Jianshi Huang

Oh, by default it's set to 0L. I'll try setting it to 3 immediately. Thanks for the help! Jianshi On Mon, Mar 16, 2015 at 11:32 PM, Jianshi Huang wrote: > Thanks Shixiong! > > Very strange that our tasks were retried on the same executor again and > again. I'll check spark.scheduler.execut

Re: Handling fatal errors of executors and decommission datanodes

2015-03-16 Thread Jianshi Huang

Thanks Shixiong! Very strange that our tasks were retried on the same executor again and again. I'll check spark.scheduler.executorTaskBlacklistTime. Jianshi On Mon, Mar 16, 2015 at 6:02 PM, Shixiong Zhu wrote: > There are 2 cases for "No space left on device": > > 1. Some tasks which use larg

Re: Handling fatal errors of executors and decommission datanodes

2015-03-16 Thread Shixiong Zhu

There are 2 cases for "No space left on device": 1. Some tasks which use large temp space cannot run in any node. 2. The free space of datanodes is not balance. Some tasks which use large temp space can not run in several nodes, but they can run in other nodes successfully. Because most of our ca

Re: Handling fatal errors of executors and decommission datanodes

2015-03-16 Thread Jianshi Huang

I created a JIRA: https://issues.apache.org/jira/browse/SPARK-6353 On Mon, Mar 16, 2015 at 5:36 PM, Jianshi Huang wrote: > Hi, > > We're facing "No space left on device" errors lately from time to time. > The job will fail after retries. Obvious in such case, retry won't be > helpful. > > Sure

Re: Handling fatal errors of executors and decommission datanodes

Re: Handling fatal errors of executors and decommission datanodes

Re: Handling fatal errors of executors and decommission datanodes

Re: Handling fatal errors of executors and decommission datanodes

4 matches

Site Navigation

Mail list logo

Footer information