alyan
>> Date 02/6/2024 10:08
>> To Jay Han
>> Cc Ashish Singh ,
>> Mridul Muralidharan ,
>> dev ,
>>
>>
>> Subject Re: [Spark-Core] Improving Reliability of spark when Executors
>> OOM
>> Hey,
>> Disk space not enough is
kalyan
> Date 02/6/2024 10:08
> To Jay Han
> Cc Ashish Singh ,
> Mridul Muralidharan ,
> dev ,
>
>
> Subject Re: [Spark-Core] Improving Reliability of spark when Executors
> OOM
> Hey,
> Disk space not enough is also a reliability concern, but might need
Hey,
Disk space not enough is also a reliability concern, but might need a diff
strategy to handle it.
As suggested by Mridul, I am working on making things more configurable in
another(new) module… with that, we can plug in new rules for each type of
error.
Regards
Kalyan.
On Mon, 5 Feb 2024 at
Hi,
what about supporting for solving the disk space problem of "device space
isn't enough"? I think it's same as OOM exception.
kalyan 于2024年1月27日周六 13:00写道:
> Hi all,
>
> Sorry for the delay in getting the first draft of (my first) SPIP out.
>
> https://docs.google.com/document/d/1hxEPUirf3eYw
Hi all,
Sorry for the delay in getting the first draft of (my first) SPIP out.
https://docs.google.com/document/d/1hxEPUirf3eYwNfMOmUHpuI5dIt_HJErCdo7_yr9htQc/edit?pli=1
Let me know what you think.
Regards
kalyan.
On Sat, Jan 20, 2024 at 8:19 AM Ashish Singh wrote:
> Hey all,
>
> Thanks for t
Hey all,
Thanks for this discussion, the timing of this couldn't be better!
At Pinterest, we recently started to look into reducing OOM failures while
also reducing memory consumption of spark applications. We considered the
following options.
1. Changing core count on executor to change memory a
Hi,
We are internally exploring adding support for dynamically changing the
resource profile of a stage based on runtime characteristics.
This includes failures due to OOM and the like, slowness due to excessive
GC, resource wastage due to excessive overprovisioning, etc.
Essentially handles sca
It is interesting. I think there are definitely some discussion points around
this. reliability vs performance is always a trade off and its great it
doesn't fail but if it doesn't meet someone's SLA now that could be as bad if
its hard to figure out why. I think if something like this kicks
Oh interesting solution, a co-worker was suggesting something similar using
resource profiles to increase memory -- but your approach avoids a lot of
complexity I like it (and we could extend it out to support resource
profile growth too).
I think an SPIP sounds like a great next step.
On Tue, Ja
Hello All,
At Uber, we had recently, done some work on improving the reliability of
spark applications in scenarios of fatter executors going out of memory and
leading to application failure. Fatter executors are those that have more
than 1 task running on it at a given time concurrently. This has
10 matches
Mail list logo