Re: What is correct behavior for spark.task.maxFailures?

2017-04-24 Thread Ryan Blue
Chawla, We hit this issue, too. I worked around it by setting spark.scheduler.executorTaskBlacklistTime=5000. The problem for us was that the scheduler was using locality to select the executor, even though it had already failed there. The executor task blacklist time controls how long the schedul

Re: What is correct behavior for spark.task.maxFailures?

2017-04-24 Thread Ryan Blue
Looking at the code a bit more, it appears that blacklisting is disabled by default. To enable it, set spark.blacklist.enabled=true. The updates in 2.1.0 appear to provide much more fine-grained settings for this, like the number of tasks that can fail before an executor is blacklisted for a stage

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Allman
The trouble we ran into is that this upgrade was blocking access to our tables, and we didn't know why. This sounds like a kind of migration operation, but it was not apparent that this was the case. It took an expert examining a stack trace and source code to figure this out. Would a more naive

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
It On Mon, Apr 24, 2017 at 10:33 AM, Michael Allman wrote: > The trouble we ran into is that this upgrade was blocking access to our > tables, and we didn't know why. This sounds like a kind of migration > operation, but it was not apparent that this was the case. It took an > expert examining a

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Holden Karau
Whoops, sorry finger slipped on that last message. It sounds like whatever we do is going to break some existing users (either with the tables by case sensitivity or with the unexpected scan). Personally I agree with Michael Allman on this, I believe we should use INFER_NEVER for 2.1.1. On Mon, A

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Eric Liang
-1 (non-binding) I also agree with using NEVER_INFER for 2.1.1. The migration cost is unexpected for a point release. On Mon, Apr 24, 2017 at 11:08 AM Holden Karau wrote: > Whoops, sorry finger slipped on that last message. > It sounds like whatever we do is going to break some existing users >

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Michael Armbrust
Yeah, I agree. -1 (binding) This vote fails, and I'll cut a new RC after #17749 is merged. On Mon, Apr 24, 2017 at 12:18 PM, Eric Liang wrote: > -1 (non-binding) > > I also agree with using NEVER_INFER for 2.1.1. The migration cost is > unexpected f

Spark Conf Problem with CacheLoader

2017-04-24 Thread John Compitello
Hey all, I’ve been working on contributing to Spark a little bit in past few weeks, but I’ve suddenly encountered a problem I’m having some trouble with. Specifically, however I’ve built Spark on my laptop, I am unable to create a SparkConf. I’ve done it in the past, but somehow it’s now broke

Re: branch-2.2 has been cut

2017-04-24 Thread Josh Rosen
I've created the Jenkins jobs for branch-2.2, including the nightly snapshot, packaging, and docs jobs. You can view the latest nightly package at https://home.apache.org/~pwendell/spark-nightly/spark-branch-2.2-bin/latest/ and nightly docs at https://home.apache.org/~pwendell/spark-nightly/spark-

Re: What is correct behavior for spark.task.maxFailures?

2017-04-24 Thread Chawla,Sumit
Thanks a lot @ Dongjin, @Ryan I am using Spark 1.6. I agree with your assesment Ryan. Further investigation seemed to suggest that our cluster was probably at 100% capacity at that point of time. Though tasks were failing on that slave, still it was accepting the task, and task retries exhaust

Re: [VOTE] Apache Spark 2.1.1 (RC3)

2017-04-24 Thread Wenchen Fan
see https://issues.apache.org/jira/browse/SPARK-19611 On Mon, Apr 24, 2017 at 2:22 PM, Holden Karau wrote: > Whats the regression this fixed in 2.1 from 2.0? > > On Fri, Apr 21, 2017 at 7:45 PM, Wenchen Fan > wrote: > >> IIRC, the new "spark.sql.hive.caseSensitiveInferenceMode" stuff will >> on