Thanks for the quick follow up Reynold and Patrick. Tried a run with
significantly higher ulimit, doesn't seem to help. The executors have 35GB
each. Btw, with a recent version of the branch, the error message is "fetch
failures" as opposed to "too many open files". Not sure if they are
related. P
Ah I see it was SPARK-2711 (and PR1707). In that case, it's possible
that you are just having more spilling as a result of the patch and so
the filesystem is opening more files. I would try increasing the
ulimit.
How much memory do your executors have?
- Patrick
On Sun, Sep 21, 2014 at 10:29 PM,
Hey the numbers you mentioned don't quite line up - did you mean PR 2711?
On Sun, Sep 21, 2014 at 8:45 PM, Reynold Xin wrote:
> It seems like you just need to raise the ulimit?
>
>
> On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi wrote:
>
>> Recently upgraded to 1.1.0. Saw a bunch of fetch failur
Hi Evan,
Sorry that I forgot to mention about it. I set the value of K as 10 for
the benchmark study.
On Friday 19 September 2014 11:24 PM, Evan R. Sparks wrote:
Hey Meethu - what are you setting "K" to in the benchmarks you show?
This can greatly affect the runtime.
On Thu, Sep 18, 2014 at
It seems like you just need to raise the ulimit?
On Sun, Sep 21, 2014 at 8:41 PM, Nishkam Ravi wrote:
> Recently upgraded to 1.1.0. Saw a bunch of fetch failures for one of the
> workloads. Tried tracing the problem through change set analysis. Looks
> like the offending commit is 4fde28c from
Recently upgraded to 1.1.0. Saw a bunch of fetch failures for one of the
workloads. Tried tracing the problem through change set analysis. Looks
like the offending commit is 4fde28c from Aug 4th for PR1707. Please see
SPARK-3633 for more details.
Thanks,
Nishkam
Hmm, good point, this seems to have been broken by refactorings of the
scheduler, but it worked in the past. Basically the solution is simple -- in a
result stage, we should not apply the update for each task ID more than once --
the same way we don't call job.listener.taskSucceeded more than on
Hi, Matei,
Can you give some hint on how the current implementation guarantee the
accumulator is only applied for once?
There is a pending PR trying to achieving this
(https://github.com/apache/spark/pull/228/files), but from the current
implementation, I didn’t see this has been done? (may