One issue I've seen is that after about 24 hours, the sparkapplication job pods seem to be getting evicted .. i've installed spark history server, and am verifying the case. It could be due to resource constraints, checking this.
Pls note : kubeflow spark operator is installed in namespace - so350, and the spark applications & spark history server are installed in ns - spark-apps This is currently running on GKE, and the nodepools have Autoscale enabled. I don't see the number of nodes increasing, so still to understand why the pods in namespace spark-apps are getting evicted by kubernetes If anyone has any input on this, pls let me know. thanks! On Sun, Apr 6, 2025 at 8:24 PM karan alang <karan.al...@gmail.com> wrote: > Thanks, Megh ! > > I did some research and realized the same - PVC is not a good option for > spark shuffle, primarily for latency issues. > The same is the case with S3 or MinIO. > > I've implemented option 2, and am testing this out currently: Storing > data in host path is possible > > regds, > Karan Alang > > > > On Sun, Apr 6, 2025 at 7:08 PM megh vidani <vidanimeg...@gmail.com> wrote: > >> Hello Karan, >> >> Apart from Celeborn, there is Apache Uniffle (Incubating) as well. We >> also have similar setup as yours and we're trying out a PoC with Uniffle >> right now. >> >> What I've gathered so far is, with Uniffle: >> 1. Storing data in PVCs is not well supported >> 2. Storing data in host path is possible >> 3. Storing data in HDFS is possible, but I'm not sure about HDFS >> compatible S3 (e.g. MinIO) storage yet, we're trying it out >> >> Thanks, >> Megh >> >> Thanks, >> Megh >> >> On Tue, Apr 1, 2025, 02:43 karan alang <karan.al...@gmail.com> wrote: >> >>> seems apache-celeborn is also an option, if anyone has used this pls let >>> me know. >>> >>> thanks! >>> >>> >>> On Mon, Mar 31, 2025 at 1:58 PM karan alang <karan.al...@gmail.com> >>> wrote: >>> >>>> hello all - checking to see if anyone has any input on this >>>> >>>> thanks! >>>> >>>> >>>> On Tue, Mar 25, 2025 at 12:22 PM karan alang <karan.al...@gmail.com> >>>> wrote: >>>> >>>>> hello All, >>>>> >>>>> I have kubeflow Spark Operator installed on k8s and from what i >>>>> understand - Spark Shuffle is not officially supported on kubernetes. >>>>> >>>>> Looking for feedback from the community on what approach is being >>>>> taken to handle this issue - especially since dynamicAllocation cannot be >>>>> enabled without Spark Shuffle. >>>>> >>>>> for eg. >>>>> Does storing the shuffle data in PVC help ? >>>>> >>>>> Pls let me know. >>>>> >>>>> tia! >>>>> >>>>> >>>>