subject:"\[Spark\-Core\] Improving Reliability of spark when Executors OOM"

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-03-18 Thread Mridul Muralidharan

alyan >> Date 02/6/2024 10:08 >> To Jay Han >> Cc Ashish Singh , >> Mridul Muralidharan , >> dev , >> >> >> Subject Re: [Spark-Core] Improving Reliability of spark when Executors >> OOM >> Hey, >> Disk space not enough is

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-03-11 Thread Ashish Singh

kalyan > Date 02/6/2024 10:08 > To Jay Han > Cc Ashish Singh , > Mridul Muralidharan , > dev , > > > Subject Re: [Spark-Core] Improving Reliability of spark when Executors > OOM > Hey, > Disk space not enough is also a reliability concern, but might need

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-02-05 Thread kalyan

Hey, Disk space not enough is also a reliability concern, but might need a diff strategy to handle it. As suggested by Mridul, I am working on making things more configurable in another(new) module… with that, we can plug in new rules for each type of error. Regards Kalyan. On Mon, 5 Feb 2024 at

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-02-04 Thread Jay Han

Hi, what about supporting for solving the disk space problem of "device space isn't enough"? I think it's same as OOM exception. kalyan 于2024年1月27日周六 13:00写道： > Hi all, > > Sorry for the delay in getting the first draft of (my first) SPIP out. > > https://docs.google.com/document/d/1hxEPUirf3eYw

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-26 Thread kalyan

Hi all, Sorry for the delay in getting the first draft of (my first) SPIP out. https://docs.google.com/document/d/1hxEPUirf3eYwNfMOmUHpuI5dIt_HJErCdo7_yr9htQc/edit?pli=1 Let me know what you think. Regards kalyan. On Sat, Jan 20, 2024 at 8:19 AM Ashish Singh wrote: > Hey all, > > Thanks for t

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-19 Thread Ashish Singh

Hey all, Thanks for this discussion, the timing of this couldn't be better! At Pinterest, we recently started to look into reducing OOM failures while also reducing memory consumption of spark applications. We considered the following options. 1. Changing core count on executor to change memory a

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Mridul Muralidharan

Hi, We are internally exploring adding support for dynamically changing the resource profile of a stage based on runtime characteristics. This includes failures due to OOM and the like, slowness due to excessive GC, resource wastage due to excessive overprovisioning, etc. Essentially handles sca

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Tom Graves

It is interesting. I think there are definitely some discussion points around this. reliability vs performance is always a trade off and its great it doesn't fail but if it doesn't meet someone's SLA now that could be as bad if its hard to figure out why. I think if something like this kicks

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau

Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue, Ja

[Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread kalyan

Hello All, At Uber, we had recently, done some work on improving the reliability of spark applications in scenarios of fatter executors going out of memory and leading to application failure. Fatter executors are those that have more than 1 task running on it at a given time concurrently. This has

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

[Spark-Core] Improving Reliability of spark when Executors OOM

10 matches

Site Navigation

Mail list logo

Footer information