Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-09-03 Thread Mich Talebzadeh
On this subject of launching both the driver and the executors using lazy executor IDs, this can introduce complexity but potentially could be a viable strategy in certain scenarios. Basically your mileage varies Pros: 1. Faster Startup: launching the driver and initial executors simultaneo

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-28 Thread Mich Talebzadeh
Thanks Qian for your feedback. I will have a look Regards, Mich view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or de

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-27 Thread Qian Sun
Hi Mich, ImageCache is an alibaba cloud ECI feature[1]. An image cache is a cluster-level resource that you can use to accelerate the creation of pods in different namespaces. If need to update the spark image, imagecache will be created in the cluster. And specify pod annotation to use image cac

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-25 Thread Mich Talebzadeh
Hi Qian, How in practice have you implemented image caching for the driver and executor pods respectively? Thanks On Thu, 24 Aug 2023 at 02:44, Qian Sun wrote: > Hi Mich > > I agree with your opinion that the startup time of the Spark on Kubernetes > cluster needs to be improved. > > Regarding

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Holden Karau
One option could be to initially launch both drivers and initial executors (using the lazy executor ID allocation), but it would introduce a lot of complexity. On Wed, Aug 23, 2023 at 6:44 PM Qian Sun wrote: > Hi Mich > > I agree with your opinion that the startup time of the Spark on Kubernetes

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Qian Sun
Hi Mich I agree with your opinion that the startup time of the Spark on Kubernetes cluster needs to be improved. Regarding the fetching image directly, I have utilized ImageCache to store the images on the node, eliminating the time required to pull images from a remote repository, which does ind

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-23 Thread Mich Talebzadeh
Hi all, On this conversion, one of the issues I brought up was the driver start-up time. This is especially true in k8s. As spark on k8s is modeled on Spark on standalone schedler, Spark on k8s consist of a single-driver pod (as master on standalone”) and a number of executors (“workers”). When e

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Mich Talebzadeh
Splendid idea. 👍 Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Holden Karau
The driver it’s self is probably another topic, perhaps I’ll make a “faster spark star time” JIRA and a DA JIRA and we can explore both. On Tue, Aug 8, 2023 at 10:07 AM Mich Talebzadeh wrote: > From my own perspective faster execution time especially with Spark on tin > boxes (Dataproc & EC2) an

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Mich Talebzadeh
>From my own perspective faster execution time especially with Spark on tin boxes (Dataproc & EC2) and Spark on k8s is something that customers often bring up. Poor time to onboard with autoscaling seems to be particularly singled out for heavy ETL jobs that use Spark. I am disappointed to see the

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread kalyan
+1 to enhancements in DEA. Long time due! There were a few things that I was thinking along the same lines for some time now(few overlap with @holden 's points) 1. How to reduce wastage on the RM side? Sometimes the driver asks for some units of resources. But when RM provisions them, the driver c

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread Mich Talebzadeh
Thanks for pointing out this feature to me. I will have a look when I get there. Mich Talebzadeh, Solutions Architect/Engineering Lead London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *

Re: [Internet]Re: Improving Dynamic Allocation Logic for Spark 4+

2023-08-08 Thread 齐赫
Spark 3.5 have added an method `supportsReliableStorage` in the `ShuffleDriverComponents` which indicate whether writing shuffle data to a distributed filesystem or persisting it in a remote shuffle service. Uniffle is a general purpose remote shuffle service (https://github.com/apache/incubat